Skip to content

Conversation

@addvanced
Copy link
Contributor

@addvanced addvanced commented Aug 17, 2025

Description:

  • Added new functionality:

    • AddSubstitution(key, value string)
    • RemoveSubstitution(key string)
    • ReplaceSubstitution(key, newValue string)
    • SetSubstitutions(substitutions map[string]string)
  • Refactored how Slugger is initialized

  • Replaced strings.ReplaceAll()-loop with strings.Replacer.

  • Added a ligatureReplacer map for some manual character replacement (e.g. ø = oe)

  • Better handling of keys, that are now sorted ascending and by word length, so e.g. && is dealt with before & etc.

  • Other minor cleanups here and there

Checklist:

  • Tests Passing: Verify by running make test.
  • Golint Passing: Confirm by running make lint.

If added a new utility function or updated existing function logic:

  • Updated the utility package EXAMPLES.md
  • Updated the utility package README.md

If added a new utility package:

  • Updated the main README.md utility packages table

Summary by CodeRabbit

  • New Features

    • Runtime management of substitutions (add/remove/replace/set), concurrency-safe updates, case-insensitive longest-match substitutions, ligature-aware ASCII normalization, flexible separator selection, and optional emoji handling.
  • Documentation

    • Added examples demonstrating dynamic substitution changes and updated API usage.
  • Tests

    • Expanded edge-case coverage, concurrency stress tests, and parallelized test scenarios.
  • Chores

    • Updated dependency declarations.

@addvanced addvanced marked this pull request as ready for review August 18, 2025 03:56
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 18, 2025

Walkthrough

Adds concurrency-safe, dynamic substitution management to Slugger via a mutex-guarded cached strings.Replacer, a new New(substitutions, withEmoji) constructor, public substitution mutators, ligature + NFKD→ASCII normalization, updated slug flow, expanded tests/benchmarks, and extra docs/examples. Public API expanded; slug generation semantics adjusted accordingly.

Changes

Cohort / File(s) Summary
Core implementation
slugger/slugger.go
Introduces mu sync.RWMutex and cached replacer *strings.Replacer; new New(substitutions map[string]string, withEmoji bool) constructor; mutators AddSubstitution, RemoveSubstitution, ReplaceSubstitution, SetSubstitutions; initReplacer (longest-first), ligatureReplacer, normalizeWordsToSafeASCII, copyMap, and defaultSeparator; slug flow now applies cached replacer and ASCII-safe normalization. Imports updated (cmp, slices, sync).
Tests & benchmarks
slugger/slugger_test.go
Tests and benchmarks migrated to New(...); added cases for dynamic substitution lifecycle (add/remove/replace/clear), expanded edge-case coverage (empty/whitespace, NFKD diacritics, longest-match precedence, emoji handling, custom separators), parallelized tests, and a concurrency stress test validating concurrent reads/writes.
Docs — API
slugger/README.md
README updated to document new constructor signature and public substitution-management methods (AddSubstitution, RemoveSubstitution, ReplaceSubstitution, SetSubstitutions) and example usage.
Docs — Examples
slugger/EXAMPLES.md
Appends two example sections (Remove stop words duplicate and Change substitutions dynamically) with Go snippets and outputs; documentation-only additions.
Module file
go.mod
Adds golang.org/x/sync v0.16.0 to require.

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant Slugger
  participant Replacer
  participant Normalizer

  rect rgb(240,248,255)
  Note over User,Slugger: Substitution management (writers)
  User->>Slugger: AddSubstitution / RemoveSubstitution / ReplaceSubstitution / SetSubstitutions
  Slugger-->>Slugger: Lock mu, mutate substitutions, invalidate replacer
  end

  rect rgb(245,255,240)
  Note over User,Slugger: Slug generation (readers)
  User->>Slugger: Slug(input, separator)
  Slugger->>Slugger: RLock -> ensure replacer built (initReplacer if nil)
  Slugger->>Replacer: apply substitutions (if present)
  Replacer-->>Slugger: substituted text
  Slugger->>Normalizer: ligature replace -> NFKD normalize -> ASCII extract words
  Normalizer-->>Slugger: []words
  Slugger-->>User: join(words, chosenSeparator)
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • kashifkhan0771
  • shahzadhaider1

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (12)
slugger/EXAMPLES.md (1)

165-170: Minor formatting nit: inconsistent indentation inside code block

Two lines use 4 spaces instead of tabs/consistent indentation, which stands out in the rendered snippet.

Apply this diff to normalize indentation:

-    s.AddSubstitution("the", "")
+	s.AddSubstitution("the", "")
@@
-    s.ReplaceSubstitution("the", "a")
+	s.ReplaceSubstitution("the", "a")
slugger/README.md (1)

17-33: APIs documented clearly; add a thread-safety note for mutators

Docs for Add/Remove/Replace/Set look good.

Given these methods mutate shared state, please add a short note that Slugger is not safe for concurrent use when these are called concurrently with Slug. Either document this or guarantee safety in code.

Suggested addition under “Slugger Methods”:

  • Thread-safety: Calling AddSubstitution, RemoveSubstitution, ReplaceSubstitution, or SetSubstitutions concurrently with Slug on the same Slugger instance is not safe. Use separate instances or external synchronization.
slugger/slugger.go (10)

36-65: Data race possibility when mutating substitutions concurrently with Slug

Slug uses sync.Once to lazily build the replacer, and mutators reset the Once. Without synchronization around sl.replacer/Sl.Substitutions access, concurrent Slug and Add/Remove/Replace/Set can race, leading to undefined behavior.

Two options:

  • Document that Slugger is not safe for concurrent mutation and usage.
  • Or guard state with a mutex. Minimal locking around replacer access:
 func (sl *Slugger) Slug(s, separator string) string {
   if s == "" {
     return ""
   }

   if sl.WithEmoji {
     s = gomoji.ReplaceEmojisWithSlug(s)
   }

   s = strings.ToLower(s)

-  sl.init.Do(sl.initReplacer)
-  if sl.replacer != nil {
-    s = sl.replacer.Replace(s)
-  }
+  sl.init.Do(sl.initReplacer)
+  // Optionally guard replacer read if mutators may run concurrently:
+  // sl.mu.RLock()
+  // r := sl.replacer
+  // sl.mu.RUnlock()
+  // if r != nil {
+  //   s = r.Replace(s)
+  // }
+  if sl.replacer != nil {
+    s = sl.replacer.Replace(s)
+  }

If you want, I can follow up with a full RWMutex integration across mutators and initReplacer.


66-75: Eagerly nil-out replacer when mutating to avoid stale state

Resetting sl.init is good; also consider setting sl.replacer = nil so a subsequent Slug call can’t accidentally reuse a stale replacer if init isn’t triggered (e.g., via early returns).

Apply this diff:

 func (sl *Slugger) AddSubstitution(key, value string) {
   if sl.Substitutions == nil {
     sl.Substitutions = make(map[string]string)
   }

   sl.Substitutions[key] = value
   sl.init = sync.Once{}
+  sl.replacer = nil
 }

Repeat the same for RemoveSubstitution, ReplaceSubstitution, and SetSubstitutions.


76-86: No-op guards are fine; consider early nil-out and doc behavior

Behavior is fine. If key doesn’t exist, nothing happens. Consider explicitly documenting that Replace/Remove on a non-existent key are no-ops.

Apply this diff to proactively clear the cached replacer as well:

 if _, exists := sl.Substitutions[key]; exists {
   delete(sl.Substitutions, key)
   sl.init = sync.Once{}
+  sl.replacer = nil
 }

88-98: ReplaceSubstitution no-op for missing keys; consider Add+Replace or doc it

It’s fine to require the key to exist; alternatively, you could treat Replace as Add-or-Replace. If keeping current behavior, please document it and also clear the cached replacer.

 if _, exists := sl.Substitutions[key]; exists {
   sl.Substitutions[key] = newValue
   sl.init = sync.Once{}
+  sl.replacer = nil
 }

100-104: SetSubstitutions shares the caller’s map — consider defensive copy

Assigning the map by reference means external mutations affect Slugger’s internal state, potentially mid-flight. Consider copying to avoid surprises.

 func (sl *Slugger) SetSubstitutions(substitutions map[string]string) {
-  sl.Substitutions = substitutions
+  if substitutions == nil {
+    sl.Substitutions = nil
+  } else {
+    // defensive copy
+    cp := make(map[string]string, len(substitutions))
+    for k, v := range substitutions {
+      cp[k] = v
+    }
+    sl.Substitutions = cp
+  }
   sl.init = sync.Once{} // Reset the initialization to rebuild the replacer
+  sl.replacer = nil
 }

106-114: Consider adding a few more common ligatures/specials and dashes

Nice touch. Consider adding œ → oe, ff/fi/fl → ff/fi/fl and mapping en/em-dash (–, —) and hyphen-minus variants (‐) to a regular hyphen to avoid token merging.

 var ligatureReplacer = strings.NewReplacer(
   "æ", "ae",
   "ø", "oe",
   "ß", "ss",
+  "œ", "oe",
+  "ff", "ff",
+  "fi", "fi",
+  "fl", "fl",
+  "–", "-",
+  "—", "-",
+  "‐", "-",
 )

115-139: Hyphen retention can yield triple separators (e.g., “A - B” → “a---b”)

normalizeWordsToSafeASCII keeps ASCII “-_.“ and splits on whitespace, so an input like “A - B” produces tokens ["a","-","b"] and joining with "-" becomes "a---b". If you want to avoid this, either drop "-" from keep and rely solely on the joiner, or normalize standalone hyphens to spaces before Fields.

One approach: transform isolated hyphens to spaces prior to Fields, or remove "-" from keep and explicitly inject separators later.


155-162: Trim/normalize substitution values when building the replacer

Users may pass values with surrounding spaces. Trimming avoids double-spacing and inconsistent tokenization.

 for k, v := range sl.Substitutions {
   if k == "" {
     continue
   }
-  subsPairs = append(subsPairs, subsKV{k: k, v: strings.ToLower(v)})
+  subsPairs = append(subsPairs, subsKV{
+    k: k,
+    v: strings.ToLower(strings.TrimSpace(v)),
+  })
 }

167-173: Longest-first sort should consider the lowered key length for rare Unicode

You sort by len(a.k), but replacements are matched on strings.ToLower(key). For unusual keys where lowercasing expands length (e.g., Turkish İ), ordering could be off. Low impact, but easy to fix.

-slices.SortFunc(subsPairs, func(a, b subsKV) int {
-  if la, lb := len(a.k), len(b.k); la != lb {
-    return cmp.Compare(lb, la) // sort by key length DESC
-  }
-  return cmp.Compare(a.k, b.k)
-})
+slices.SortFunc(subsPairs, func(a, b subsKV) int {
+  ak, bk := strings.ToLower(a.k), strings.ToLower(b.k)
+  if la, lb := len(ak), len(bk); la != lb {
+    return cmp.Compare(lb, la) // sort by key length DESC
+  }
+  return cmp.Compare(ak, bk)
+})

175-179: Leading space in replacement is intentional; consider documenting pitfall

Replacing key with " "+value can cause concatenation when the key is adjacent to a word (e.g., "a&b" -> "a andb"). Your tests prefer this behavior (e.g., "#HelloWorld" → "hashtaghelloworld"). Consider calling it out in README so users know how adjacency is handled.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 846bd9b and 8219808.

📒 Files selected for processing (4)
  • slugger/EXAMPLES.md (1 hunks)
  • slugger/README.md (1 hunks)
  • slugger/slugger.go (1 hunks)
  • slugger/slugger_test.go (2 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: addvanced
PR: kashifkhan0771/utils#177
File: slugger/slugger.go:39-43
Timestamp: 2025-08-12T07:38:35.407Z
Learning: In the slugger package, the user addvanced mentioned they have a separate branch that refactors the substitution logic from using strings.ReplaceAll in a loop to using strings.NewReplacer, which is a more appropriate and efficient approach for handling multiple string replacements.
Learnt from: addvanced
PR: kashifkhan0771/utils#184
File: conversion/helpers.go:0-0
Timestamp: 2025-08-17T10:28:12.524Z
Learning: User addvanced prefers pragmatic solutions that balance clean API design with backward compatibility. When dealing with helper functions that shouldn't have been exported, they're willing to use deprecated wrappers as a compromise while improving the internal implementation with more flexible, general-purpose functions.
Learnt from: addvanced
PR: kashifkhan0771/utils#184
File: conversion/helpers.go:0-0
Timestamp: 2025-08-17T10:28:12.524Z
Learning: User addvanced prefers pragmatic solutions that balance clean API design with backward compatibility. When dealing with helper functions that shouldn't have been exported, they're willing to use deprecated wrappers as a compromise while improving the internal implementation with more flexible, general-purpose functions.
📚 Learning: 2025-08-12T07:38:35.407Z
Learnt from: addvanced
PR: kashifkhan0771/utils#177
File: slugger/slugger.go:39-43
Timestamp: 2025-08-12T07:38:35.407Z
Learning: In the slugger package, the user addvanced mentioned they have a separate branch that refactors the substitution logic from using strings.ReplaceAll in a loop to using strings.NewReplacer, which is a more appropriate and efficient approach for handling multiple string replacements.

Applied to files:

  • slugger/README.md
  • slugger/EXAMPLES.md
  • slugger/slugger_test.go
  • slugger/slugger.go
📚 Learning: 2025-08-11T04:49:26.949Z
Learnt from: addvanced
PR: kashifkhan0771/utils#171
File: strings/strings.go:0-0
Timestamp: 2025-08-11T04:49:26.949Z
Learning: In the `strings` package (specifically `strings/strings.go`), the `SubstringSearch` function now uses a Unicode-safe implementation for case-insensitive searches. It uses `strings.EqualFold` with rune-based window scanning to handle cases where lowercasing changes byte length (e.g., Turkish İ), ensuring correct index and substring extraction for international text.

Applied to files:

  • slugger/slugger.go
🧬 Code Graph Analysis (1)
slugger/slugger_test.go (1)
slugger/slugger.go (1)
  • New (28-34)
🔇 Additional comments (4)
slugger/slugger.go (1)

3-8: Go version requirement satisfied

  • go.mod specifies go 1.24
  • CI is using Go 1.24.6, which includes both cmp/slices APIs (Go 1.21+) and testing.B.Loop
    No changes required.
slugger/slugger_test.go (3)

98-101: t.Parallel in subtests: nice for speeding up the suite

Parallelizing subtests is appropriate here and safe since each test uses its own Slugger instance.


126-205: Edge-case coverage is strong

Good coverage for whitespace collapsing, diacritics removal, case-insensitive substitutions, longest-match behavior, emoji off-path, and multi-character separators. This materially reduces regression risk.


207-229: b.Loop is supported in Go 1.24; no fallback required
The module’s go.mod declares Go 1.24 and testing.B.Loop was introduced in Go 1.24, so these benchmarks will compile as-is under your CI toolchain.

@addvanced addvanced force-pushed the feature/update-slugger branch from 8219808 to 7b8e054 Compare August 18, 2025 04:41
@addvanced addvanced force-pushed the feature/update-slugger branch 2 times, most recently from 0d18b1d to f733604 Compare August 18, 2025 04:53
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (4)
slugger/slugger_test.go (1)

100-120: Consider adding a race-focused test to exercise concurrent updates vs. reads.

Given the new dynamic mutation APIs, add a targeted race test to validate thread safety under -race.

Example (drop into this file) to run with go test -race:

func TestSlugger_ConcurrentUpdates(t *testing.T) {
	t.Parallel()
	sl := New(map[string]string{"&": "and"}, false)

	done := make(chan struct{})
	defer close(done)

	// Readers
	var wg sync.WaitGroup
	for i := 0; i < 8; i++ {
		wg.Add(1)
		go func() {
			defer wg.Done()
			for j := 0; j < 10_000; j++ {
				_ = sl.Slug("A && B & C", "-")
			}
		}()
	}

	// Writers
	for i := 0; i < 4; i++ {
		wg.Add(1)
		go func(k int) {
			defer wg.Done()
			for j := 0; j < 2_000; j++ {
				switch j % 3 {
				case 0:
					sl.AddSubstitution("&&", "andand")
				case 1:
					sl.ReplaceSubstitution("&", "and")
				default:
					sl.RemoveSubstitution("&&")
				}
			}
		}(i)
	}

	wg.Wait()
}

This will help surface any data races between Slug and the mutation methods.

slugger/slugger.go (3)

118-130: Normalize substitution keys/values to lowercase at the API boundary for consistency.

Input is lowercased before replacement, but your map stores keys verbatim. Adding with "FOO" then removing with "foo" will fail. Canonicalizing at write-time simplifies reasoning and avoids duplicate-case entries.

Apply:

 func (sl *Slugger) AddSubstitution(key, value string) {
   sl.mu.Lock()
   defer sl.mu.Unlock()
 
   if sl.Substitutions == nil {
     sl.Substitutions = make(map[string]string)
   }
 
-  sl.Substitutions[key] = value
+  key = strings.ToLower(key)
+  value = strings.ToLower(value)
+  sl.Substitutions[key] = value
   sl.replacer = nil
 }
 
 func (sl *Slugger) RemoveSubstitution(key string) {
   sl.mu.Lock()
   defer sl.mu.Unlock()
 
   if len(sl.Substitutions) == 0 {
     return
   }
 
-  if _, exists := sl.Substitutions[key]; exists {
+  key = strings.ToLower(key)
+  if _, exists := sl.Substitutions[key]; exists {
     delete(sl.Substitutions, key)
     sl.replacer = nil
   }
 }
 
 func (sl *Slugger) ReplaceSubstitution(key, newValue string) {
   sl.mu.Lock()
   defer sl.mu.Unlock()
 
   if len(sl.Substitutions) == 0 {
     return
   }
 
-  if _, exists := sl.Substitutions[key]; exists {
-    sl.Substitutions[key] = newValue
+  key = strings.ToLower(key)
+  newValue = strings.ToLower(newValue)
+  if _, exists := sl.Substitutions[key]; exists {
+    sl.Substitutions[key] = newValue
     sl.replacer = nil
   }
 }
 
 func (sl *Slugger) SetSubstitutions(substitutions map[string]string) {
   sl.mu.Lock()
   defer sl.mu.Unlock()
 
   if substitutions == nil {
     sl.Substitutions = nil
   } else {
     cp := make(map[string]string, len(substitutions))
     for k, v := range substitutions {
-      cp[k] = v
+      cp[strings.ToLower(k)] = strings.ToLower(v)
     }
     sl.Substitutions = cp
   }
 
   sl.replacer = nil
 }

Also applies to: 71-84, 85-99, 101-115


203-211: Minor: avoid repeated ToLower in sort comparator; precompute lowercased keys.

strings.ToLower in the comparator is called O(n log n) times. Precomputing lowers for keys/values once reduces overhead for larger substitution maps.

Example refactor:

- type subsKV struct {
-   k, v string
- }
+ type subsKV struct {
+   k, v string // original
+   lk, lv string // lowercased
+ }
 ...
- for k, v := range sl.Substitutions {
+ for k, v := range sl.Substitutions {
     if k == "" {
       continue
     }
-    subsPairs = append(subsPairs, subsKV{
-      k: k,
-      v: strings.ToLower(v),
-    })
+    subsPairs = append(subsPairs, subsKV{
+      k:  k,
+      v:  v,
+      lk: strings.ToLower(k),
+      lv: strings.ToLower(v),
+    })
 }
 ...
- slices.SortFunc(subsPairs, func(a, b subsKV) int {
-   ak, bk := strings.ToLower(a.k), strings.ToLower(b.k)
-   if la, lb := len(ak), len(bk); la != lb {
+ slices.SortFunc(subsPairs, func(a, b subsKV) int {
+   if la, lb := len(a.lk), len(b.lk); la != lb {
      return cmp.Compare(lb, la) // sort by key length DESC
    }
-   return cmp.Compare(ak, bk)
+   return cmp.Compare(a.lk, b.lk)
 })
 ...
- subs = append(subs, strings.ToLower(sub.k), " "+sub.v)
+ subs = append(subs, sub.lk, " "+sub.lv)

Also applies to: 188-198, 212-215


214-215: Intentional leading space in replacer output—document the behavior.

Using " "+sub.v intentionally forces word boundaries so strings.Fields can collapse to separators. This is clever but non-obvious. Consider adding a short note in README and here to make this contract explicit (especially for users expecting exact in-place substitution).

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 8219808 and 1d88375.

📒 Files selected for processing (4)
  • slugger/EXAMPLES.md (1 hunks)
  • slugger/README.md (1 hunks)
  • slugger/slugger.go (1 hunks)
  • slugger/slugger_test.go (2 hunks)
✅ Files skipped from review due to trivial changes (1)
  • slugger/README.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • slugger/EXAMPLES.md
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: addvanced
PR: kashifkhan0771/utils#177
File: slugger/slugger.go:39-43
Timestamp: 2025-08-12T07:38:35.407Z
Learning: In the slugger package, the user addvanced mentioned they have a separate branch that refactors the substitution logic from using strings.ReplaceAll in a loop to using strings.NewReplacer, which is a more appropriate and efficient approach for handling multiple string replacements.
Learnt from: addvanced
PR: kashifkhan0771/utils#184
File: conversion/helpers.go:0-0
Timestamp: 2025-08-17T10:28:12.553Z
Learning: User addvanced prefers pragmatic solutions that balance clean API design with backward compatibility. When dealing with helper functions that shouldn't have been exported, they're willing to use deprecated wrappers as a compromise while improving the internal implementation with more flexible, general-purpose functions.
Learnt from: addvanced
PR: kashifkhan0771/utils#184
File: conversion/helpers.go:0-0
Timestamp: 2025-08-17T10:28:12.553Z
Learning: User addvanced prefers pragmatic solutions that balance clean API design with backward compatibility. When dealing with helper functions that shouldn't have been exported, they're willing to use deprecated wrappers as a compromise while improving the internal implementation with more flexible, general-purpose functions.
📚 Learning: 2025-08-12T07:38:35.407Z
Learnt from: addvanced
PR: kashifkhan0771/utils#177
File: slugger/slugger.go:39-43
Timestamp: 2025-08-12T07:38:35.407Z
Learning: In the slugger package, the user addvanced mentioned they have a separate branch that refactors the substitution logic from using strings.ReplaceAll in a loop to using strings.NewReplacer, which is a more appropriate and efficient approach for handling multiple string replacements.

Applied to files:

  • slugger/slugger.go
  • slugger/slugger_test.go
📚 Learning: 2025-08-11T04:49:26.949Z
Learnt from: addvanced
PR: kashifkhan0771/utils#171
File: strings/strings.go:0-0
Timestamp: 2025-08-11T04:49:26.949Z
Learning: In the `strings` package (specifically `strings/strings.go`), the `SubstringSearch` function now uses a Unicode-safe implementation for case-insensitive searches. It uses `strings.EqualFold` with rune-based window scanning to handle cases where lowercasing changes byte length (e.g., Turkish İ), ensuring correct index and substring extraction for international text.

Applied to files:

  • slugger/slugger.go
🧬 Code graph analysis (1)
slugger/slugger_test.go (1)
slugger/slugger.go (1)
  • New (29-35)
🔇 Additional comments (5)
slugger/slugger_test.go (3)

62-95: Great coverage of dynamic substitutions, emoji handling, normalization, and longest-match logic.

The table-driven tests and edge cases align with the new API surface and expected behaviors, including longest-match precedence (&& vs &), ASCII normalization, and multi-char separators.

Also applies to: 128-168


211-216: Benchmark b.Loop() is supported by Go 1.24
Your go.mod specifies go 1.24, which includes support for testing.B.Loop()—no changes are needed.


98-106: Tests are safe under Go 1.24; no rebind needed
The go.mod file specifies Go 1.24, which includes the per-iteration variable binding semantics introduced in Go 1.22. The parallel subtests in slugger/slugger_test.go (lines 98–106 and 198–206) correctly capture tt, so the suggested tt := tt workaround is unnecessary.

slugger/slugger.go (2)

171-181: Longest-match precedence looks good; strings.Replacer will prefer earlier-inserted patterns at the same index.

Sorting by key length DESC then lexicographically ensures "&&" wins over "&" at the same position. This matches your edge-case tests.

Also applies to: 203-211


68-69: No action needed—Go 1.24 covers both cmp.Or’s generics requirement and testing.B.Loop.

The module’s go.mod specifies Go 1.24, which:

  • Exceeds Go 1.18’s introduction of generics (so your cmp.Or helper is fully supported).
  • Exceeds Go 1.21’s addition of testing.B.Loop() (so your benchmarks will compile as written).

You can safely leave the code—and the go directive—unchanged.

@addvanced addvanced force-pushed the feature/update-slugger branch 2 times, most recently from 3870ebb to ee6cb70 Compare August 22, 2025 04:18
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
slugger/slugger.go (1)

129-130: Nit: fix ligature example in comment (œ → oe, not ae).

The code maps ø→oe and æ→ae correctly; the comment’s example for œ should be “œ → oe”.

Apply:

-// Add more if you run into them (e.g. œ → ae, ß → ss).
+// Add more if you run into them (e.g. œ → oe, ß → ss).
🧹 Nitpick comments (2)
slugger/slugger.go (1)

188-195: Optional: consider rune-length ordering if substitutions include multi-byte keys.

You sort by byte length (len). If users add multi-byte keys (e.g., “€” vs “€” + extra), byte-length ordering may not reflect “longest by characters.” Given strings.Replacer compares bytes, current logic is consistent; this is only relevant if a “character length” notion is desired.

Would you prefer rune count ordering?

- if la, lb := len(a.k), len(b.k); la != lb {
+ if la, lb := len([]rune(a.k)), len([]rune(b.k)); la != lb {
   return cmp.Compare(lb, la)
 }
slugger/slugger_test.go (1)

126-205: Edge-case suite is strong; consider adding a concurrent update/read stress test.

Given the package now supports dynamic updates, a race-guard test that calls Slug concurrently with Add/Replace/Remove would be valuable (and will catch the locking issues noted). I can draft it if you want.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 1d88375 and ee6cb70.

📒 Files selected for processing (4)
  • slugger/EXAMPLES.md (1 hunks)
  • slugger/README.md (1 hunks)
  • slugger/slugger.go (1 hunks)
  • slugger/slugger_test.go (2 hunks)
✅ Files skipped from review due to trivial changes (1)
  • slugger/README.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • slugger/EXAMPLES.md
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: addvanced
PR: kashifkhan0771/utils#177
File: slugger/slugger.go:39-43
Timestamp: 2025-08-12T07:38:35.407Z
Learning: In the slugger package, the user addvanced mentioned they have a separate branch that refactors the substitution logic from using strings.ReplaceAll in a loop to using strings.NewReplacer, which is a more appropriate and efficient approach for handling multiple string replacements.
Learnt from: addvanced
PR: kashifkhan0771/utils#184
File: conversion/helpers.go:0-0
Timestamp: 2025-08-17T10:28:12.553Z
Learning: User addvanced prefers pragmatic solutions that balance clean API design with backward compatibility. When dealing with helper functions that shouldn't have been exported, they're willing to use deprecated wrappers as a compromise while improving the internal implementation with more flexible, general-purpose functions.
Learnt from: addvanced
PR: kashifkhan0771/utils#184
File: conversion/helpers.go:0-0
Timestamp: 2025-08-17T10:28:12.553Z
Learning: User addvanced prefers pragmatic solutions that balance clean API design with backward compatibility. When dealing with helper functions that shouldn't have been exported, they're willing to use deprecated wrappers as a compromise while improving the internal implementation with more flexible, general-purpose functions.
📚 Learning: 2025-08-12T07:38:35.407Z
Learnt from: addvanced
PR: kashifkhan0771/utils#177
File: slugger/slugger.go:39-43
Timestamp: 2025-08-12T07:38:35.407Z
Learning: In the slugger package, the user addvanced mentioned they have a separate branch that refactors the substitution logic from using strings.ReplaceAll in a loop to using strings.NewReplacer, which is a more appropriate and efficient approach for handling multiple string replacements.

Applied to files:

  • slugger/slugger_test.go
  • slugger/slugger.go
📚 Learning: 2025-08-11T04:49:26.949Z
Learnt from: addvanced
PR: kashifkhan0771/utils#171
File: strings/strings.go:0-0
Timestamp: 2025-08-11T04:49:26.949Z
Learning: In the `strings` package (specifically `strings/strings.go`), the `SubstringSearch` function now uses a Unicode-safe implementation for case-insensitive searches. It uses `strings.EqualFold` with rune-based window scanning to handle cases where lowercasing changes byte length (e.g., Turkish İ), ensuring correct index and substring extraction for international text.

Applied to files:

  • slugger/slugger.go
📚 Learning: 2025-08-21T09:50:42.184Z
Learnt from: addvanced
PR: kashifkhan0771/utils#189
File: slugger/slugger.go:136-144
Timestamp: 2025-08-21T09:50:42.184Z
Learning: In Danish, the ligature æ maps to "ae" and ø maps to "oe". These are the correct ASCII equivalents for Danish text normalization in the slugger package.

Applied to files:

  • slugger/slugger.go
🧬 Code graph analysis (1)
slugger/slugger_test.go (1)
slugger/slugger.go (1)
  • New (28-34)
🔇 Additional comments (5)
slugger/slugger.go (3)

27-33: Good: constructor copies caller map.

copyMap in New avoids external mutation/data races and mirrors SetSubstitutions semantics. This is the right call.


136-159: LGTM: normalization pipeline is robust and focused.

Ligature folding, NFKD decomposition, ASCII filtering, and Fields split are clean and predictable; tests cover the intended behavior.


71-72: No Go version gate needed—module and CI already target Go ≥1.22

go.mod declares “go 1.24”, satisfying the requirement for cmp.Or (added in Go 1.22).
GitHub Actions workflows use actions/setup-go@v5 with go-version-file: 'go.mod', so they automatically install Go 1.24 as specified. No .gitlab-ci.yml is present to adjust.

All checks pass; no changes needed.

slugger/slugger_test.go (2)

7-93: Tests read well and cover core scenarios.

Nice coverage: defaults, custom separators, emoji path, normalization, and dynamic substitution mutations. Using per-test Slugger instances plus t.Parallel is tidy.


210-212: No changes needed—Go 1.24 is already in use

I’ve verified:

  • go.mod declares go 1.24
  • Both GitHub workflows (.github/workflows/go.yml and golangci-lint.yml) use actions/setup-go@v5 with go-version-file: 'go.mod'

Since CI picks up Go 1.24 from go.mod, testing.B.Loop() at lines 210, 218, and 226 will compile without requiring fallbacks.

@addvanced addvanced force-pushed the feature/update-slugger branch from ee6cb70 to d8a618b Compare August 22, 2025 05:45
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
slugger/slugger_test.go (3)

3-11: Prep imports for a fail-fast, context-driven concurrency test.

You’ll need context here if you switch TestSluggerConcurrentSubstitutions to errgroup.WithContext (see suggestion below). Also: the range-over-integers and b.Loop usages elsewhere are valid with go 1.24+, which you’ve set in go.mod. (tip.golang.org, pkg.go.dev)

Apply:

 import (
-	"fmt"
+	"context"
+	"fmt"
 	"testing"
 	"time"
 	"unicode"
 
 	"github.com/kashifkhan0771/utils/rand"
 	"golang.org/x/sync/errgroup"
 )

213-329: Make the concurrency stress test fail fast and non-flaky (use errgroup.WithContext + timeout).

Currently, if a reader detects an error early, other goroutines keep spinning until the 3s sleep elapses because the group has no cancellation path. Switching to a context-driven loop lets the test (a) stop immediately on error and (b) run for a bounded duration without explicit sleeps. Also avoids leaked goroutines if the test fails.

Note: “for range 8/4” is fine in Go ≥1.22, and b.Loop is fine in Go ≥1.24; your go.mod uses toolchain go1.24.6, which is compatible and includes fixes for early b.Loop issues in 1.24.0. (tip.golang.org, pkg.go.dev, github.com)

Apply:

 func TestSluggerConcurrentSubstitutions(t *testing.T) {
 	t.Parallel()
@@
-	var (
-		eg   = errgroup.Group{}
-		stop = make(chan struct{})
-	)
+	ctx, cancel := context.WithTimeout(context.Background(), 3*time.Second)
+	defer cancel()
+	eg, ctx := errgroup.WithContext(ctx)
@@
-	for range 8 {
+	for range 8 {
 		eg.Go(func() error {
 			for {
-				select {
-				case <-stop:
-					return nil
-				default:
-				}
+				select {
+				case <-ctx.Done():
+					return nil
+				default:
+				}
@@
-	for range 4 {
+	for range 4 {
 		eg.Go(func() error {
 			keys := []string{"hello", "😀", "æ", "foo", "bar", "baz", "qux", "quux"}
 
 			for {
-				select {
-				case <-stop:
-					return nil
-				default:
-				}
+				select {
+				case <-ctx.Done():
+					return nil
+				default:
+				}
@@
-	// Run until deadline.
-	time.Sleep(3 * time.Second)
-	close(stop)
-
-	// Wait for all goroutines to finish.
-	if err := eg.Wait(); err != nil {
+	// Wait for completion (either timeout or earlier failure).
+	if err := eg.Wait(); err != nil {
 		t.Fatal(err)
 	}

Optional: derive the timeout from t.Deadline() when available to play nicer with -timeout and -race.


120-124: Minor: add a focused ReplaceSubstitution test.

You exercise ReplaceSubstitution under load, but a deterministic unit test would catch subtle ordering/case-insensitivity bugs without relying on randomness. Example to add as another table row:

{
  name:          "ReplaceSubstitution updates value only",
  input:         "Price is 10 %",
  separator:     "-",
  substitutions: map[string]string{"%": "percent"},
  substitutionsChange: map[string]string{"%": "pct"},
  expected:      "price-is-10-pct",
}

Happy to draft the exact subtest if you prefer a separate TestSlugger_ReplaceSubstitution.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between ee6cb70 and d8a618b.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (3)
  • go.mod (1 hunks)
  • slugger/slugger.go (1 hunks)
  • slugger/slugger_test.go (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • slugger/slugger.go
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: addvanced
PR: kashifkhan0771/utils#177
File: slugger/slugger.go:39-43
Timestamp: 2025-08-12T07:38:35.407Z
Learning: In the slugger package, the user addvanced mentioned they have a separate branch that refactors the substitution logic from using strings.ReplaceAll in a loop to using strings.NewReplacer, which is a more appropriate and efficient approach for handling multiple string replacements.
Learnt from: addvanced
PR: kashifkhan0771/utils#184
File: conversion/helpers.go:0-0
Timestamp: 2025-08-17T10:28:12.553Z
Learning: User addvanced prefers pragmatic solutions that balance clean API design with backward compatibility. When dealing with helper functions that shouldn't have been exported, they're willing to use deprecated wrappers as a compromise while improving the internal implementation with more flexible, general-purpose functions.
Learnt from: addvanced
PR: kashifkhan0771/utils#184
File: conversion/helpers.go:0-0
Timestamp: 2025-08-17T10:28:12.553Z
Learning: User addvanced prefers pragmatic solutions that balance clean API design with backward compatibility. When dealing with helper functions that shouldn't have been exported, they're willing to use deprecated wrappers as a compromise while improving the internal implementation with more flexible, general-purpose functions.
📚 Learning: 2025-08-11T04:49:26.949Z
Learnt from: addvanced
PR: kashifkhan0771/utils#171
File: strings/strings.go:0-0
Timestamp: 2025-08-11T04:49:26.949Z
Learning: In the `strings` package (specifically `strings/strings.go`), the `SubstringSearch` function now uses a Unicode-safe implementation for case-insensitive searches. It uses `strings.EqualFold` with rune-based window scanning to handle cases where lowercasing changes byte length (e.g., Turkish İ), ensuring correct index and substring extraction for international text.

Applied to files:

  • go.mod
📚 Learning: 2025-08-12T07:38:35.407Z
Learnt from: addvanced
PR: kashifkhan0771/utils#177
File: slugger/slugger.go:39-43
Timestamp: 2025-08-12T07:38:35.407Z
Learning: In the slugger package, the user addvanced mentioned they have a separate branch that refactors the substitution logic from using strings.ReplaceAll in a loop to using strings.NewReplacer, which is a more appropriate and efficient approach for handling multiple string replacements.

Applied to files:

  • slugger/slugger_test.go
🧬 Code graph analysis (1)
slugger/slugger_test.go (2)
slugger/slugger.go (1)
  • New (28-34)
rand/rand.go (2)
  • Pick (77-89)
  • NumberInRange (42-64)
🔇 Additional comments (5)
go.mod (1)

13-16: Commit the updated go.mod after running go mod tidy

The go mod tidy execution has already:

  • Added golang.org/x/sync v0.16.0 into the primary require block (dropping the // indirect annotation).
  • Removed the separate indirect-only require block for x/sync and consolidated the module requirements.

No further manual edits are needed—please commit the updated go.mod.

slugger/slugger_test.go (4)

66-73: Nice coverage for SetSubstitutions semantics.

This validates whole-map replacement and case-insensitive keys. Good guard for API regressions in the new substitutions API.


75-91: Good targeted tests for Add/Remove path.

Exercising AddSubstitution and RemoveSubstitution separately helps catch ordering and cache invalidation bugs in the replacer build path. Well done.


179-184: Great “longest-match wins” assertion.

This will catch any regression in key ordering (e.g., processing “&” before “&&”). It aligns with the move to strings.Replacer and sorted keys.


331-353: Adopting testing.B.Loop is the right call for Go 1.24+.

b.Loop is purpose-built to avoid benchmark gotchas (timer management, DCE) and should be preferred over b.N. Your go.mod’s toolchain (go1.24.6) is sufficient. (pkg.go.dev)

If you haven’t already, consider adding b.ReportAllocs() to one benchmark to track allocator effects of the new replacer path.

@addvanced addvanced force-pushed the feature/update-slugger branch 2 times, most recently from 6856be9 to 654ff48 Compare August 22, 2025 06:19
@addvanced
Copy link
Contributor Author

@kashifkhan0771 I feel like this should be ready as well now. :) Let's see if the bunny rabbit has any input - if not, then all is good from here. :)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
slugger/slugger.go (1)

139-146: Fix comment: œ maps to “oe”, not “ae”. Optionally include œ in the table.

The example currently says “œ → ae”. Correct mapping is “œ → oe”. Also consider actually adding “œ” to ligatureReplacer for completeness (NFKD typically decomposes it, but being explicit is clearer).

Apply:

-// Add more if you run into them (e.g. œ → ae, ß → ss).
+// Add more if you run into them (e.g. œ → oe, ß → ss).

Optionally:

 var ligatureReplacer = strings.NewReplacer(
 	"æ", "ae",
 	"ø", "oe",
 	"ß", "ss",
+	"œ", "oe",
 )
🧹 Nitpick comments (5)
slugger/slugger.go (3)

53-61: Avoid redundant rebuilds under contention (double-check under lock).

Today, multiple Slug callers seeing r == nil will all call initReplacer. Each will take the lock and rebuild even if another goroutine already built it moments earlier. It’s safe but wasteful. Add an early check under the write lock and remove the internal “set to nil” to prevent unnecessary rebuilds.

Apply:

 func (sl *Slugger) initReplacer() {
-	sl.mu.Lock()
-	defer sl.mu.Unlock()
-
-	// Reset the replacer to nil so that it can be rebuilt with the latest substitutions
-	sl.replacer = nil
+	sl.mu.Lock()
+	// Another goroutine might have already built it between the RUnlock and this Lock.
+	if sl.replacer != nil {
+		sl.mu.Unlock()
+		return
+	}
+	defer sl.mu.Unlock()

 	if len(sl.Substitutions) == 0 {
 		return
 	}

203-209: Comparator allocates on every compare; precompute rune lengths.

len([]rune(a.k)) allocates; it runs many times during sort. Precompute rune counts once when building subsPairs and compare those ints.

Apply:

@@
-	// struct to hold key-value pairs for sorting
-	type subsKV struct {
-		k, v string
-	}
+	// struct to hold key-value pairs plus precomputed rune length for sorting
+	type subsKV struct {
+		k, v string
+		n    int
+	}
@@
-	for k, v := range sl.Substitutions {
+	for k, v := range sl.Substitutions {
 		if k == "" {
 			continue
 		}
-		subsPairs = append(subsPairs, subsKV{k: k, v: strings.ToLower(v)})
+		subsPairs = append(subsPairs, subsKV{
+			k: k,
+			v: strings.ToLower(v),
+			n: utf8.RuneCountInString(k),
+		})
 	}
@@
-	slices.SortFunc(subsPairs, func(a, b subsKV) int {
-		if la, lb := len([]rune(a.k)), len([]rune(b.k)); la != lb {
-			return cmp.Compare(lb, la) // sort by key length DESC
-		}
-		return cmp.Compare(a.k, b.k)
-	})
+	slices.SortFunc(subsPairs, func(a, b subsKV) int {
+		if a.n != b.n {
+			return cmp.Compare(b.n, a.n) // sort by key length DESC
+		}
+		return cmp.Compare(a.k, b.k)
+	})

And add the import:

 import (
 	"cmp"
 	"slices"
 	"strings"
 	"sync"
 	"unicode"
+	"unicode/utf8"

211-214: Intentional leading space in replacement values — document this quirk.

Building the replacer as ("key", " "+value) guarantees token separation when a key appears mid-word but subtly changes semantics vs a plain substitution. It’s a good trick; call it out in the public docs/comments so users aren’t surprised by leading-space semantics.

I can propose a short README note if you’d like.

slugger/slugger_test.go (2)

335-357: b.Loop requires a very recent Go; add fallback to for-loop if needed.

The new testing.B API is not available in older toolchains. If CI/tooling isn’t pinned, swap to the canonical for i := 0; i < b.N; i++ form.

Run the same go.mod check as above. If not pinned, here’s a fallback:

 func BenchmarkSlugger_Slug(b *testing.B) {
 	sl := New(map[string]string{"&": "and"}, false)
-
-	for b.Loop() {
-		sl.Slug("Wôrķšpáçè ~~sèťtïñğš~~", "")
-	}
+	for i := 0; i < b.N; i++ {
+		_ = sl.Slug("Wôrķšpáçè ~~sèťtïñğš~~", "")
+	}
 }
@@
 func BenchmarkSlugger_Slug_WithEmoji(b *testing.B) {
 	sl := New(map[string]string{"&": "and"}, true)
-
-	for b.Loop() {
-		sl.Slug("a 😺, 🐈‍⬛, and a 🦁 go to 🏞️", "")
-	}
+	for i := 0; i < b.N; i++ {
+		_ = sl.Slug("a 😺, 🐈‍⬛, and a 🦁 go to 🏞️", "")
+	}
 }
@@
 func BenchmarkSlugger_Slug_CustomSeparator(b *testing.B) {
 	sl := New(map[string]string{"&": "and"}, false)
-
-	for b.Loop() {
-		sl.Slug("Wôrķšpáçè ~~sèťtïñğš~~", "|")
-	}
+	for i := 0; i < b.N; i++ {
+		_ = sl.Slug("Wôrķšpáçè ~~sèťtïñğš~~", "|")
+	}
 }

151-179: Confirm expected ligature mappings align with product goals.

The test asserts æ→ae and ø→oe (per prior learnings about Danish mappings). This is correct for Danish and matches the code path. Just calling out that this is language-specific; for other locales you might prefer stripping to a/o. If broader locale handling is planned later, a note in docs would help.

I used your earlier learning about Danish mappings to validate this behavior.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between d8a618b and 654ff48.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (3)
  • go.mod (1 hunks)
  • slugger/slugger.go (1 hunks)
  • slugger/slugger_test.go (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • go.mod
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: addvanced
PR: kashifkhan0771/utils#177
File: slugger/slugger.go:39-43
Timestamp: 2025-08-12T07:38:35.407Z
Learning: In the slugger package, the user addvanced mentioned they have a separate branch that refactors the substitution logic from using strings.ReplaceAll in a loop to using strings.NewReplacer, which is a more appropriate and efficient approach for handling multiple string replacements.
Learnt from: addvanced
PR: kashifkhan0771/utils#184
File: conversion/helpers.go:0-0
Timestamp: 2025-08-17T10:28:12.553Z
Learning: User addvanced prefers pragmatic solutions that balance clean API design with backward compatibility. When dealing with helper functions that shouldn't have been exported, they're willing to use deprecated wrappers as a compromise while improving the internal implementation with more flexible, general-purpose functions.
Learnt from: addvanced
PR: kashifkhan0771/utils#184
File: conversion/helpers.go:0-0
Timestamp: 2025-08-17T10:28:12.553Z
Learning: User addvanced prefers pragmatic solutions that balance clean API design with backward compatibility. When dealing with helper functions that shouldn't have been exported, they're willing to use deprecated wrappers as a compromise while improving the internal implementation with more flexible, general-purpose functions.
📚 Learning: 2025-08-12T07:38:35.407Z
Learnt from: addvanced
PR: kashifkhan0771/utils#177
File: slugger/slugger.go:39-43
Timestamp: 2025-08-12T07:38:35.407Z
Learning: In the slugger package, the user addvanced mentioned they have a separate branch that refactors the substitution logic from using strings.ReplaceAll in a loop to using strings.NewReplacer, which is a more appropriate and efficient approach for handling multiple string replacements.

Applied to files:

  • slugger/slugger.go
  • slugger/slugger_test.go
📚 Learning: 2025-08-11T04:49:26.949Z
Learnt from: addvanced
PR: kashifkhan0771/utils#171
File: strings/strings.go:0-0
Timestamp: 2025-08-11T04:49:26.949Z
Learning: In the `strings` package (specifically `strings/strings.go`), the `SubstringSearch` function now uses a Unicode-safe implementation for case-insensitive searches. It uses `strings.EqualFold` with rune-based window scanning to handle cases where lowercasing changes byte length (e.g., Turkish İ), ensuring correct index and substring extraction for international text.

Applied to files:

  • slugger/slugger.go
📚 Learning: 2025-08-21T09:50:42.184Z
Learnt from: addvanced
PR: kashifkhan0771/utils#189
File: slugger/slugger.go:136-144
Timestamp: 2025-08-21T09:50:42.184Z
Learning: In Danish, the ligature æ maps to "ae" and ø maps to "oe". These are the correct ASCII equivalents for Danish text normalization in the slugger package.

Applied to files:

  • slugger/slugger.go
🧬 Code graph analysis (1)
slugger/slugger_test.go (2)
slugger/slugger.go (1)
  • New (28-34)
rand/rand.go (2)
  • Pick (77-89)
  • NumberInRange (42-64)
🔇 Additional comments (7)
slugger/slugger.go (4)

23-25: Good concurrency story for shared state.

Guarding both Substitutions and the cached replacer with an RWMutex is the right call; reading r with RLock and rebuilding under write lock balances safety with fast-path performance.


27-33: Copy-on-construct prevents external mutation/races.

New copies the caller’s map (mirrors SetSubstitutions behavior). This removes a whole class of heisenbugs where a caller mutates the map during Slug.


74-115: Thread-safe mutators are correct and minimal.

Add/Remove/Replace/Set all acquire mu.Lock, update Substitutions, and nil the replacer. This matches the lazy rebuild model and keeps Slug lock-free on the hot path.


71-72: No action needed: Go version ≥1.24 is pinned
The module’s go.mod declares go 1.24 and includes a toolchain go1.24.6 directive, so cmp.Or (added in Go 1.22+) is fully supported in CI and tooling. You can safely keep the existing code.

slugger/slugger_test.go (3)

141-207: Edge-case coverage is excellent.

Covers empty/whitespace inputs, safe punctuation, diacritic stripping, case-insensitive substitutions, longest-match precedence, emoji off-path, and custom separators. This gives strong confidence in the new pipeline.


222-333: Concurrency stress test is meaningful and balanced.

The readers/writers mix, errgroup with timeout, and ASCII/space assertions are pragmatic and should catch common race regressions around substitutions/replacer. Nicely done.


111-116: Go version pinned to 1.24 – no shadowing needed

The module’s go.mod specifies

  • go 1.24
  • toolchain go1.24.6

So Go’s per-iteration variable semantics (introduced in 1.22) ensure tt is safely captured in each parallel subtest. No additional tt := tt shadowing is required.

@addvanced addvanced force-pushed the feature/update-slugger branch from 654ff48 to 37c7dd1 Compare August 22, 2025 06:58
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
slugger/slugger.go (1)

140-147: Typo in comment: œ maps to “oe”, not “ae”.

This example is incorrect and may confuse readers. Code is fine; fix the comment.

Apply:

-// Add more if you run into them (e.g. œ → ae, ß → ss).
+// Add more if you run into them (e.g. œ → oe, ß → ss).
🧹 Nitpick comments (3)
slugger/slugger.go (3)

199-210: Sort by the actual match key length (post-lowercasing) to ensure true longest-match precedence.

You lower-case the input and the replacer keys; however, n is computed on the original key. For edge cases where lowercasing changes rune length, ordering could be suboptimal. Store lowercased k/v upfront, compute n on lowerK, and avoid re-lowering later.

Apply:

- subsPairs := make([]subsKV, 0, len(sl.Substitutions))
- for k, v := range sl.Substitutions {
+ subsPairs := make([]subsKV, 0, len(sl.Substitutions))
+ for k, v := range sl.Substitutions {
     if k == "" {
       continue
     }
-   subsPairs = append(subsPairs,
-     subsKV{
-       k: k,
-       v: strings.ToLower(v),
-       n: utf8.RuneCountInString(k),
-     })
+   lowerK := strings.ToLower(k)
+   lowerV := strings.ToLower(v)
+   subsPairs = append(subsPairs, subsKV{
+     k: lowerK,
+     v: lowerV,
+     n: utf8.RuneCountInString(lowerK),
+   })
   }
@@
- subs := make([]string, 0, len(subsPairs)*2)
- for _, sub := range subsPairs {
-   subs = append(subs, strings.ToLower(sub.k), " "+sub.v)
- }
+ subs := make([]string, 0, len(subsPairs)*2)
+ for _, sub := range subsPairs {
+   subs = append(subs, sub.k, " "+sub.v)
+ }

Also applies to: 216-222, 224-227


186-189: Minor: redundant nil assignment.

Inside initReplacer, sl.replacer is guaranteed to be nil here due to the early return on Line 179–183. You can drop the extra assignment for clarity.

- // Reset the replacer to nil so that it can be rebuilt with the latest substitutions
- sl.replacer = nil
+ // Rebuild with latest substitutions

80-86: Nit: the nil check on Substitutions is unnecessary after New/SetSubstitutions.

copyMap returns a non-nil map, so Substitutions will not be nil in normal use. This is harmless, but you could simplify.

- if sl.Substitutions == nil {
-   sl.Substitutions = make(map[string]string)
- }
+ // Substitutions is non-nil after New/SetSubstitutions; keep for defensive coding if desired.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 654ff48 and 37c7dd1.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (3)
  • go.mod (1 hunks)
  • slugger/slugger.go (1 hunks)
  • slugger/slugger_test.go (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • go.mod
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: addvanced
PR: kashifkhan0771/utils#177
File: slugger/slugger.go:39-43
Timestamp: 2025-08-12T07:38:35.407Z
Learning: In the slugger package, the user addvanced mentioned they have a separate branch that refactors the substitution logic from using strings.ReplaceAll in a loop to using strings.NewReplacer, which is a more appropriate and efficient approach for handling multiple string replacements.
Learnt from: addvanced
PR: kashifkhan0771/utils#184
File: conversion/helpers.go:0-0
Timestamp: 2025-08-17T10:28:12.553Z
Learning: User addvanced prefers pragmatic solutions that balance clean API design with backward compatibility. When dealing with helper functions that shouldn't have been exported, they're willing to use deprecated wrappers as a compromise while improving the internal implementation with more flexible, general-purpose functions.
Learnt from: addvanced
PR: kashifkhan0771/utils#184
File: conversion/helpers.go:0-0
Timestamp: 2025-08-17T10:28:12.553Z
Learning: User addvanced prefers pragmatic solutions that balance clean API design with backward compatibility. When dealing with helper functions that shouldn't have been exported, they're willing to use deprecated wrappers as a compromise while improving the internal implementation with more flexible, general-purpose functions.
📚 Learning: 2025-08-12T07:38:35.407Z
Learnt from: addvanced
PR: kashifkhan0771/utils#177
File: slugger/slugger.go:39-43
Timestamp: 2025-08-12T07:38:35.407Z
Learning: In the slugger package, the user addvanced mentioned they have a separate branch that refactors the substitution logic from using strings.ReplaceAll in a loop to using strings.NewReplacer, which is a more appropriate and efficient approach for handling multiple string replacements.

Applied to files:

  • slugger/slugger_test.go
  • slugger/slugger.go
📚 Learning: 2025-08-11T04:49:26.949Z
Learnt from: addvanced
PR: kashifkhan0771/utils#171
File: strings/strings.go:0-0
Timestamp: 2025-08-11T04:49:26.949Z
Learning: In the `strings` package (specifically `strings/strings.go`), the `SubstringSearch` function now uses a Unicode-safe implementation for case-insensitive searches. It uses `strings.EqualFold` with rune-based window scanning to handle cases where lowercasing changes byte length (e.g., Turkish İ), ensuring correct index and substring extraction for international text.

Applied to files:

  • slugger/slugger.go
📚 Learning: 2025-08-21T09:50:42.184Z
Learnt from: addvanced
PR: kashifkhan0771/utils#189
File: slugger/slugger.go:136-144
Timestamp: 2025-08-21T09:50:42.184Z
Learning: In Danish, the ligature æ maps to "ae" and ø maps to "oe". These are the correct ASCII equivalents for Danish text normalization in the slugger package.

Applied to files:

  • slugger/slugger.go
🧬 Code graph analysis (1)
slugger/slugger_test.go (2)
slugger/slugger.go (1)
  • New (29-35)
rand/rand.go (2)
  • Pick (77-89)
  • NumberInRange (42-64)
🔇 Additional comments (6)
slugger/slugger.go (2)

54-65: Good synchronization on replacer build and mutation.

  • Reads are under RLock with a lazy init path.
  • initReplacer() is write-locked, and mutators lock and nil the cache.
  • This removes the prior sync.Once data race risk and makes Slug safe under concurrent readers/writers.

Also applies to: 176-185, 75-86, 88-101, 103-116, 118-125


72-73: Ensure Go toolchain compatibility for cmp.Or (Go ≥1.22)

  • go.mod declares go 1.24 and a toolchain go1.24.6, so the use of cmp.Or (added in Go 1.22) is fully supported by your module settings.
  • I didn’t find any CI workflow files in .github/workflows/*.yml. Please confirm that your CI environment (GitHub Actions, Travis, CircleCI, etc.) is configured to use Go 1.24 or later so that cmp.Or is available during automated builds and tests.
slugger/slugger_test.go (4)

113-116: Parallel subtests: nice touch.

Good use of t.Parallel for independent cases; matches the concurrency-safe design in slugger.go.

Also applies to: 210-214


181-193: Great: validates longest-match substitution precedence.

This test will catch regressions where shorter keys (&) are applied before longer ones (&&).


169-179: ASCII normalization coverage looks solid.

Good to see explicit checks for æ/ø/ß and NFKD diacritics. This aligns with ligatureReplacer and normalizeWordsToSafeASCII.


268-292: Go version requirements satisfied

  • go.mod declares go 1.24 (line 3), meeting the minimum required toolchain version.
  • Both .github/workflows/go.yml and .github/workflows/golangci-lint.yml use actions/setup-go@v5 with go-version-file: 'go.mod', so CI will automatically pick up Go 1.24.

No further changes to the module directive or CI configuration are needed.

@addvanced addvanced force-pushed the feature/update-slugger branch from 37c7dd1 to bac8092 Compare August 22, 2025 07:13
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
slugger/slugger.go (1)

173-180: Consider the race condition if replacer is modified between checks

There's a potential race where the replacer could be set by another goroutine between line 175's check and line 176's unlock. While the current implementation is safe because initReplacer uses proper locking and multiple initializations are harmless, this pattern could be cleaner.

Consider simplifying to avoid the early return pattern:

 func (sl *Slugger) initReplacer() {
 	sl.mu.Lock()
-
-	if sl.replacer != nil {
-		sl.mu.Unlock()
-
-		return
-	}
 	defer sl.mu.Unlock()
+
+	// Recheck under lock in case another goroutine already initialized
+	if sl.replacer != nil {
+		return
+	}
slugger/slugger_test.go (1)

308-309: Minor: timestamp format could be more readable

The timestamp format "150405.000" generates values like "150405.000" which might not be immediately recognizable as a time. Consider using a more standard format for debugging.

Consider using a more readable timestamp format:

-				k := pick(keys) + time.Now().Format("150405.000")
+				k := pick(keys) + time.Now().Format("15:04:05.000")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 37c7dd1 and bac8092.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (3)
  • go.mod (1 hunks)
  • slugger/slugger.go (1 hunks)
  • slugger/slugger_test.go (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • go.mod
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: addvanced
PR: kashifkhan0771/utils#177
File: slugger/slugger.go:39-43
Timestamp: 2025-08-12T07:38:35.407Z
Learning: In the slugger package, the user addvanced mentioned they have a separate branch that refactors the substitution logic from using strings.ReplaceAll in a loop to using strings.NewReplacer, which is a more appropriate and efficient approach for handling multiple string replacements.
Learnt from: addvanced
PR: kashifkhan0771/utils#184
File: conversion/helpers.go:0-0
Timestamp: 2025-08-17T10:28:12.553Z
Learning: User addvanced prefers pragmatic solutions that balance clean API design with backward compatibility. When dealing with helper functions that shouldn't have been exported, they're willing to use deprecated wrappers as a compromise while improving the internal implementation with more flexible, general-purpose functions.
Learnt from: addvanced
PR: kashifkhan0771/utils#184
File: conversion/helpers.go:0-0
Timestamp: 2025-08-17T10:28:12.553Z
Learning: User addvanced prefers pragmatic solutions that balance clean API design with backward compatibility. When dealing with helper functions that shouldn't have been exported, they're willing to use deprecated wrappers as a compromise while improving the internal implementation with more flexible, general-purpose functions.
📚 Learning: 2025-08-12T07:38:35.407Z
Learnt from: addvanced
PR: kashifkhan0771/utils#177
File: slugger/slugger.go:39-43
Timestamp: 2025-08-12T07:38:35.407Z
Learning: In the slugger package, the user addvanced mentioned they have a separate branch that refactors the substitution logic from using strings.ReplaceAll in a loop to using strings.NewReplacer, which is a more appropriate and efficient approach for handling multiple string replacements.

Applied to files:

  • slugger/slugger.go
  • slugger/slugger_test.go
📚 Learning: 2025-08-21T09:50:42.184Z
Learnt from: addvanced
PR: kashifkhan0771/utils#189
File: slugger/slugger.go:136-144
Timestamp: 2025-08-21T09:50:42.184Z
Learning: In Danish, the ligature æ maps to "ae" and ø maps to "oe". These are the correct ASCII equivalents for Danish text normalization in the slugger package.

Applied to files:

  • slugger/slugger.go
🧬 Code graph analysis (1)
slugger/slugger_test.go (2)
slugger/slugger.go (1)
  • New (29-35)
rand/rand.go (2)
  • Pick (77-89)
  • NumberInRange (42-64)
🔇 Additional comments (6)
slugger/slugger.go (4)

28-35: Good call using copyMap to prevent external mutations!

I see from our past discussions that you've addressed the data race concern by copying the input map. This ensures the Slugger owns its substitutions and prevents external callers from causing race conditions.


54-65: Proper concurrency control with RWMutex!

The double-checked pattern with RLock for the fast path and lazy initialization works well here. The read lock protects against concurrent reads while mutators are operating, and the initReplacer properly handles write locking internally.


139-143: Excellent ligature handling!

The mappings are linguistically correct - æ→ae and ø→oe for Danish text normalization. Using a pre-built strings.Replacer for these common ligatures is efficient.


219-220: I couldn’t find any tests or documentation explaining why a leading space is prepended to each substitution value, and the quick scan of nearby code didn’t reveal an obvious rationale. To understand whether this is intentional—for example, to ensure word boundaries when replacing—or an artifact that could be removed, I’ve added a deeper inspection script. Once you’ve confirmed the intent (or pointed me to existing docs/tests), I can advise whether we should remove the extra space or document its necessity.

slugger/slugger_test.go (2)

222-333: Comprehensive concurrency testing!

Excellent stress test with 8 concurrent readers and 4 writers exercising all mutation methods. The timeout context and ASCII/space validation provide good safety checks. Using errgroup for coordinated error handling is a best practice.


187-193: Smart test for longest-match precedence!

This test case effectively validates that the sorting by length (descending) in initReplacer works correctly. Without proper ordering, "&&" would never match if "&" was processed first.

@kashifkhan0771
Copy link
Owner

I am super busy these days. Once I get a chance I will review this PR.

@kashifkhan0771
Copy link
Owner

kashifkhan0771 commented Sep 11, 2025

@addvanced when you get a chance please resolve the conflicts. I'll try to ship this by the end of week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants