-
Notifications
You must be signed in to change notification settings - Fork 35
slugger: Added new functions and some refactoring/cleanup #189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughAdds concurrency-safe, dynamic substitution management to Slugger via a mutex-guarded cached Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Slugger
participant Replacer
participant Normalizer
rect rgb(240,248,255)
Note over User,Slugger: Substitution management (writers)
User->>Slugger: AddSubstitution / RemoveSubstitution / ReplaceSubstitution / SetSubstitutions
Slugger-->>Slugger: Lock mu, mutate substitutions, invalidate replacer
end
rect rgb(245,255,240)
Note over User,Slugger: Slug generation (readers)
User->>Slugger: Slug(input, separator)
Slugger->>Slugger: RLock -> ensure replacer built (initReplacer if nil)
Slugger->>Replacer: apply substitutions (if present)
Replacer-->>Slugger: substituted text
Slugger->>Normalizer: ligature replace -> NFKD normalize -> ASCII extract words
Normalizer-->>Slugger: []words
Slugger-->>User: join(words, chosenSeparator)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (12)
slugger/EXAMPLES.md (1)
165-170: Minor formatting nit: inconsistent indentation inside code blockTwo lines use 4 spaces instead of tabs/consistent indentation, which stands out in the rendered snippet.
Apply this diff to normalize indentation:
- s.AddSubstitution("the", "") + s.AddSubstitution("the", "") @@ - s.ReplaceSubstitution("the", "a") + s.ReplaceSubstitution("the", "a")slugger/README.md (1)
17-33: APIs documented clearly; add a thread-safety note for mutatorsDocs for Add/Remove/Replace/Set look good.
Given these methods mutate shared state, please add a short note that Slugger is not safe for concurrent use when these are called concurrently with Slug. Either document this or guarantee safety in code.
Suggested addition under “Slugger Methods”:
- Thread-safety: Calling AddSubstitution, RemoveSubstitution, ReplaceSubstitution, or SetSubstitutions concurrently with Slug on the same Slugger instance is not safe. Use separate instances or external synchronization.
slugger/slugger.go (10)
36-65: Data race possibility when mutating substitutions concurrently with SlugSlug uses sync.Once to lazily build the replacer, and mutators reset the Once. Without synchronization around sl.replacer/Sl.Substitutions access, concurrent Slug and Add/Remove/Replace/Set can race, leading to undefined behavior.
Two options:
- Document that Slugger is not safe for concurrent mutation and usage.
- Or guard state with a mutex. Minimal locking around replacer access:
func (sl *Slugger) Slug(s, separator string) string { if s == "" { return "" } if sl.WithEmoji { s = gomoji.ReplaceEmojisWithSlug(s) } s = strings.ToLower(s) - sl.init.Do(sl.initReplacer) - if sl.replacer != nil { - s = sl.replacer.Replace(s) - } + sl.init.Do(sl.initReplacer) + // Optionally guard replacer read if mutators may run concurrently: + // sl.mu.RLock() + // r := sl.replacer + // sl.mu.RUnlock() + // if r != nil { + // s = r.Replace(s) + // } + if sl.replacer != nil { + s = sl.replacer.Replace(s) + }If you want, I can follow up with a full RWMutex integration across mutators and initReplacer.
66-75: Eagerly nil-out replacer when mutating to avoid stale stateResetting sl.init is good; also consider setting sl.replacer = nil so a subsequent Slug call can’t accidentally reuse a stale replacer if init isn’t triggered (e.g., via early returns).
Apply this diff:
func (sl *Slugger) AddSubstitution(key, value string) { if sl.Substitutions == nil { sl.Substitutions = make(map[string]string) } sl.Substitutions[key] = value sl.init = sync.Once{} + sl.replacer = nil }Repeat the same for RemoveSubstitution, ReplaceSubstitution, and SetSubstitutions.
76-86: No-op guards are fine; consider early nil-out and doc behaviorBehavior is fine. If key doesn’t exist, nothing happens. Consider explicitly documenting that Replace/Remove on a non-existent key are no-ops.
Apply this diff to proactively clear the cached replacer as well:
if _, exists := sl.Substitutions[key]; exists { delete(sl.Substitutions, key) sl.init = sync.Once{} + sl.replacer = nil }
88-98: ReplaceSubstitution no-op for missing keys; consider Add+Replace or doc itIt’s fine to require the key to exist; alternatively, you could treat Replace as Add-or-Replace. If keeping current behavior, please document it and also clear the cached replacer.
if _, exists := sl.Substitutions[key]; exists { sl.Substitutions[key] = newValue sl.init = sync.Once{} + sl.replacer = nil }
100-104: SetSubstitutions shares the caller’s map — consider defensive copyAssigning the map by reference means external mutations affect Slugger’s internal state, potentially mid-flight. Consider copying to avoid surprises.
func (sl *Slugger) SetSubstitutions(substitutions map[string]string) { - sl.Substitutions = substitutions + if substitutions == nil { + sl.Substitutions = nil + } else { + // defensive copy + cp := make(map[string]string, len(substitutions)) + for k, v := range substitutions { + cp[k] = v + } + sl.Substitutions = cp + } sl.init = sync.Once{} // Reset the initialization to rebuild the replacer + sl.replacer = nil }
106-114: Consider adding a few more common ligatures/specials and dashesNice touch. Consider adding œ → oe, ff/fi/fl → ff/fi/fl and mapping en/em-dash (–, —) and hyphen-minus variants (‐) to a regular hyphen to avoid token merging.
var ligatureReplacer = strings.NewReplacer( "æ", "ae", "ø", "oe", "ß", "ss", + "œ", "oe", + "ff", "ff", + "fi", "fi", + "fl", "fl", + "–", "-", + "—", "-", + "‐", "-", )
115-139: Hyphen retention can yield triple separators (e.g., “A - B” → “a---b”)normalizeWordsToSafeASCII keeps ASCII “-_.“ and splits on whitespace, so an input like “A - B” produces tokens ["a","-","b"] and joining with "-" becomes "a---b". If you want to avoid this, either drop "-" from keep and rely solely on the joiner, or normalize standalone hyphens to spaces before Fields.
One approach: transform isolated hyphens to spaces prior to Fields, or remove "-" from keep and explicitly inject separators later.
155-162: Trim/normalize substitution values when building the replacerUsers may pass values with surrounding spaces. Trimming avoids double-spacing and inconsistent tokenization.
for k, v := range sl.Substitutions { if k == "" { continue } - subsPairs = append(subsPairs, subsKV{k: k, v: strings.ToLower(v)}) + subsPairs = append(subsPairs, subsKV{ + k: k, + v: strings.ToLower(strings.TrimSpace(v)), + }) }
167-173: Longest-first sort should consider the lowered key length for rare UnicodeYou sort by len(a.k), but replacements are matched on strings.ToLower(key). For unusual keys where lowercasing expands length (e.g., Turkish İ), ordering could be off. Low impact, but easy to fix.
-slices.SortFunc(subsPairs, func(a, b subsKV) int { - if la, lb := len(a.k), len(b.k); la != lb { - return cmp.Compare(lb, la) // sort by key length DESC - } - return cmp.Compare(a.k, b.k) -}) +slices.SortFunc(subsPairs, func(a, b subsKV) int { + ak, bk := strings.ToLower(a.k), strings.ToLower(b.k) + if la, lb := len(ak), len(bk); la != lb { + return cmp.Compare(lb, la) // sort by key length DESC + } + return cmp.Compare(ak, bk) +})
175-179: Leading space in replacement is intentional; consider documenting pitfallReplacing key with " "+value can cause concatenation when the key is adjacent to a word (e.g., "a&b" -> "a andb"). Your tests prefer this behavior (e.g., "#HelloWorld" → "hashtaghelloworld"). Consider calling it out in README so users know how adjacency is handled.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
slugger/EXAMPLES.md(1 hunks)slugger/README.md(1 hunks)slugger/slugger.go(1 hunks)slugger/slugger_test.go(2 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: addvanced
PR: kashifkhan0771/utils#177
File: slugger/slugger.go:39-43
Timestamp: 2025-08-12T07:38:35.407Z
Learning: In the slugger package, the user addvanced mentioned they have a separate branch that refactors the substitution logic from using strings.ReplaceAll in a loop to using strings.NewReplacer, which is a more appropriate and efficient approach for handling multiple string replacements.
Learnt from: addvanced
PR: kashifkhan0771/utils#184
File: conversion/helpers.go:0-0
Timestamp: 2025-08-17T10:28:12.524Z
Learning: User addvanced prefers pragmatic solutions that balance clean API design with backward compatibility. When dealing with helper functions that shouldn't have been exported, they're willing to use deprecated wrappers as a compromise while improving the internal implementation with more flexible, general-purpose functions.
Learnt from: addvanced
PR: kashifkhan0771/utils#184
File: conversion/helpers.go:0-0
Timestamp: 2025-08-17T10:28:12.524Z
Learning: User addvanced prefers pragmatic solutions that balance clean API design with backward compatibility. When dealing with helper functions that shouldn't have been exported, they're willing to use deprecated wrappers as a compromise while improving the internal implementation with more flexible, general-purpose functions.
📚 Learning: 2025-08-12T07:38:35.407Z
Learnt from: addvanced
PR: kashifkhan0771/utils#177
File: slugger/slugger.go:39-43
Timestamp: 2025-08-12T07:38:35.407Z
Learning: In the slugger package, the user addvanced mentioned they have a separate branch that refactors the substitution logic from using strings.ReplaceAll in a loop to using strings.NewReplacer, which is a more appropriate and efficient approach for handling multiple string replacements.
Applied to files:
slugger/README.mdslugger/EXAMPLES.mdslugger/slugger_test.goslugger/slugger.go
📚 Learning: 2025-08-11T04:49:26.949Z
Learnt from: addvanced
PR: kashifkhan0771/utils#171
File: strings/strings.go:0-0
Timestamp: 2025-08-11T04:49:26.949Z
Learning: In the `strings` package (specifically `strings/strings.go`), the `SubstringSearch` function now uses a Unicode-safe implementation for case-insensitive searches. It uses `strings.EqualFold` with rune-based window scanning to handle cases where lowercasing changes byte length (e.g., Turkish İ), ensuring correct index and substring extraction for international text.
Applied to files:
slugger/slugger.go
🧬 Code Graph Analysis (1)
slugger/slugger_test.go (1)
slugger/slugger.go (1)
New(28-34)
🔇 Additional comments (4)
slugger/slugger.go (1)
3-8: Go version requirement satisfied
- go.mod specifies
go 1.24- CI is using Go 1.24.6, which includes both
cmp/slicesAPIs (Go 1.21+) andtesting.B.Loop
No changes required.slugger/slugger_test.go (3)
98-101: t.Parallel in subtests: nice for speeding up the suiteParallelizing subtests is appropriate here and safe since each test uses its own Slugger instance.
126-205: Edge-case coverage is strongGood coverage for whitespace collapsing, diacritics removal, case-insensitive substitutions, longest-match behavior, emoji off-path, and multi-character separators. This materially reduces regression risk.
207-229: b.Loop is supported in Go 1.24; no fallback required
The module’s go.mod declares Go 1.24 and testing.B.Loop was introduced in Go 1.24, so these benchmarks will compile as-is under your CI toolchain.
8219808 to
7b8e054
Compare
0d18b1d to
f733604
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (4)
slugger/slugger_test.go (1)
100-120: Consider adding a race-focused test to exercise concurrent updates vs. reads.Given the new dynamic mutation APIs, add a targeted race test to validate thread safety under
-race.Example (drop into this file) to run with
go test -race:func TestSlugger_ConcurrentUpdates(t *testing.T) { t.Parallel() sl := New(map[string]string{"&": "and"}, false) done := make(chan struct{}) defer close(done) // Readers var wg sync.WaitGroup for i := 0; i < 8; i++ { wg.Add(1) go func() { defer wg.Done() for j := 0; j < 10_000; j++ { _ = sl.Slug("A && B & C", "-") } }() } // Writers for i := 0; i < 4; i++ { wg.Add(1) go func(k int) { defer wg.Done() for j := 0; j < 2_000; j++ { switch j % 3 { case 0: sl.AddSubstitution("&&", "andand") case 1: sl.ReplaceSubstitution("&", "and") default: sl.RemoveSubstitution("&&") } } }(i) } wg.Wait() }This will help surface any data races between
Slugand the mutation methods.slugger/slugger.go (3)
118-130: Normalize substitution keys/values to lowercase at the API boundary for consistency.Input is lowercased before replacement, but your map stores keys verbatim. Adding with "FOO" then removing with "foo" will fail. Canonicalizing at write-time simplifies reasoning and avoids duplicate-case entries.
Apply:
func (sl *Slugger) AddSubstitution(key, value string) { sl.mu.Lock() defer sl.mu.Unlock() if sl.Substitutions == nil { sl.Substitutions = make(map[string]string) } - sl.Substitutions[key] = value + key = strings.ToLower(key) + value = strings.ToLower(value) + sl.Substitutions[key] = value sl.replacer = nil } func (sl *Slugger) RemoveSubstitution(key string) { sl.mu.Lock() defer sl.mu.Unlock() if len(sl.Substitutions) == 0 { return } - if _, exists := sl.Substitutions[key]; exists { + key = strings.ToLower(key) + if _, exists := sl.Substitutions[key]; exists { delete(sl.Substitutions, key) sl.replacer = nil } } func (sl *Slugger) ReplaceSubstitution(key, newValue string) { sl.mu.Lock() defer sl.mu.Unlock() if len(sl.Substitutions) == 0 { return } - if _, exists := sl.Substitutions[key]; exists { - sl.Substitutions[key] = newValue + key = strings.ToLower(key) + newValue = strings.ToLower(newValue) + if _, exists := sl.Substitutions[key]; exists { + sl.Substitutions[key] = newValue sl.replacer = nil } } func (sl *Slugger) SetSubstitutions(substitutions map[string]string) { sl.mu.Lock() defer sl.mu.Unlock() if substitutions == nil { sl.Substitutions = nil } else { cp := make(map[string]string, len(substitutions)) for k, v := range substitutions { - cp[k] = v + cp[strings.ToLower(k)] = strings.ToLower(v) } sl.Substitutions = cp } sl.replacer = nil }Also applies to: 71-84, 85-99, 101-115
203-211: Minor: avoid repeatedToLowerin sort comparator; precompute lowercased keys.
strings.ToLowerin the comparator is called O(n log n) times. Precomputing lowers for keys/values once reduces overhead for larger substitution maps.Example refactor:
- type subsKV struct { - k, v string - } + type subsKV struct { + k, v string // original + lk, lv string // lowercased + } ... - for k, v := range sl.Substitutions { + for k, v := range sl.Substitutions { if k == "" { continue } - subsPairs = append(subsPairs, subsKV{ - k: k, - v: strings.ToLower(v), - }) + subsPairs = append(subsPairs, subsKV{ + k: k, + v: v, + lk: strings.ToLower(k), + lv: strings.ToLower(v), + }) } ... - slices.SortFunc(subsPairs, func(a, b subsKV) int { - ak, bk := strings.ToLower(a.k), strings.ToLower(b.k) - if la, lb := len(ak), len(bk); la != lb { + slices.SortFunc(subsPairs, func(a, b subsKV) int { + if la, lb := len(a.lk), len(b.lk); la != lb { return cmp.Compare(lb, la) // sort by key length DESC } - return cmp.Compare(ak, bk) + return cmp.Compare(a.lk, b.lk) }) ... - subs = append(subs, strings.ToLower(sub.k), " "+sub.v) + subs = append(subs, sub.lk, " "+sub.lv)Also applies to: 188-198, 212-215
214-215: Intentional leading space in replacer output—document the behavior.Using
" "+sub.vintentionally forces word boundaries sostrings.Fieldscan collapse to separators. This is clever but non-obvious. Consider adding a short note in README and here to make this contract explicit (especially for users expecting exact in-place substitution).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
slugger/EXAMPLES.md(1 hunks)slugger/README.md(1 hunks)slugger/slugger.go(1 hunks)slugger/slugger_test.go(2 hunks)
✅ Files skipped from review due to trivial changes (1)
- slugger/README.md
🚧 Files skipped from review as they are similar to previous changes (1)
- slugger/EXAMPLES.md
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: addvanced
PR: kashifkhan0771/utils#177
File: slugger/slugger.go:39-43
Timestamp: 2025-08-12T07:38:35.407Z
Learning: In the slugger package, the user addvanced mentioned they have a separate branch that refactors the substitution logic from using strings.ReplaceAll in a loop to using strings.NewReplacer, which is a more appropriate and efficient approach for handling multiple string replacements.
Learnt from: addvanced
PR: kashifkhan0771/utils#184
File: conversion/helpers.go:0-0
Timestamp: 2025-08-17T10:28:12.553Z
Learning: User addvanced prefers pragmatic solutions that balance clean API design with backward compatibility. When dealing with helper functions that shouldn't have been exported, they're willing to use deprecated wrappers as a compromise while improving the internal implementation with more flexible, general-purpose functions.
Learnt from: addvanced
PR: kashifkhan0771/utils#184
File: conversion/helpers.go:0-0
Timestamp: 2025-08-17T10:28:12.553Z
Learning: User addvanced prefers pragmatic solutions that balance clean API design with backward compatibility. When dealing with helper functions that shouldn't have been exported, they're willing to use deprecated wrappers as a compromise while improving the internal implementation with more flexible, general-purpose functions.
📚 Learning: 2025-08-12T07:38:35.407Z
Learnt from: addvanced
PR: kashifkhan0771/utils#177
File: slugger/slugger.go:39-43
Timestamp: 2025-08-12T07:38:35.407Z
Learning: In the slugger package, the user addvanced mentioned they have a separate branch that refactors the substitution logic from using strings.ReplaceAll in a loop to using strings.NewReplacer, which is a more appropriate and efficient approach for handling multiple string replacements.
Applied to files:
slugger/slugger.goslugger/slugger_test.go
📚 Learning: 2025-08-11T04:49:26.949Z
Learnt from: addvanced
PR: kashifkhan0771/utils#171
File: strings/strings.go:0-0
Timestamp: 2025-08-11T04:49:26.949Z
Learning: In the `strings` package (specifically `strings/strings.go`), the `SubstringSearch` function now uses a Unicode-safe implementation for case-insensitive searches. It uses `strings.EqualFold` with rune-based window scanning to handle cases where lowercasing changes byte length (e.g., Turkish İ), ensuring correct index and substring extraction for international text.
Applied to files:
slugger/slugger.go
🧬 Code graph analysis (1)
slugger/slugger_test.go (1)
slugger/slugger.go (1)
New(29-35)
🔇 Additional comments (5)
slugger/slugger_test.go (3)
62-95: Great coverage of dynamic substitutions, emoji handling, normalization, and longest-match logic.The table-driven tests and edge cases align with the new API surface and expected behaviors, including longest-match precedence (&& vs &), ASCII normalization, and multi-char separators.
Also applies to: 128-168
211-216: Benchmarkb.Loop()is supported by Go 1.24
Yourgo.modspecifiesgo 1.24, which includes support fortesting.B.Loop()—no changes are needed.
98-106: Tests are safe under Go 1.24; no rebind needed
The go.mod file specifies Go 1.24, which includes the per-iteration variable binding semantics introduced in Go 1.22. The parallel subtests in slugger/slugger_test.go (lines 98–106 and 198–206) correctly capturett, so the suggestedtt := ttworkaround is unnecessary.slugger/slugger.go (2)
171-181: Longest-match precedence looks good; strings.Replacer will prefer earlier-inserted patterns at the same index.Sorting by key length DESC then lexicographically ensures "&&" wins over "&" at the same position. This matches your edge-case tests.
Also applies to: 203-211
68-69: No action needed—Go 1.24 covers bothcmp.Or’s generics requirement andtesting.B.Loop.The module’s
go.modspecifies Go 1.24, which:
- Exceeds Go 1.18’s introduction of generics (so your
cmp.Orhelper is fully supported).- Exceeds Go 1.21’s addition of
testing.B.Loop()(so your benchmarks will compile as written).You can safely leave the code—and the go directive—unchanged.
…string normalization
3870ebb to
ee6cb70
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
slugger/slugger.go (1)
129-130: Nit: fix ligature example in comment (œ → oe, not ae).The code maps ø→oe and æ→ae correctly; the comment’s example for œ should be “œ → oe”.
Apply:
-// Add more if you run into them (e.g. œ → ae, ß → ss). +// Add more if you run into them (e.g. œ → oe, ß → ss).
🧹 Nitpick comments (2)
slugger/slugger.go (1)
188-195: Optional: consider rune-length ordering if substitutions include multi-byte keys.You sort by byte length (len). If users add multi-byte keys (e.g., “€” vs “€” + extra), byte-length ordering may not reflect “longest by characters.” Given strings.Replacer compares bytes, current logic is consistent; this is only relevant if a “character length” notion is desired.
Would you prefer rune count ordering?
- if la, lb := len(a.k), len(b.k); la != lb { + if la, lb := len([]rune(a.k)), len([]rune(b.k)); la != lb { return cmp.Compare(lb, la) }slugger/slugger_test.go (1)
126-205: Edge-case suite is strong; consider adding a concurrent update/read stress test.Given the package now supports dynamic updates, a race-guard test that calls Slug concurrently with Add/Replace/Remove would be valuable (and will catch the locking issues noted). I can draft it if you want.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
slugger/EXAMPLES.md(1 hunks)slugger/README.md(1 hunks)slugger/slugger.go(1 hunks)slugger/slugger_test.go(2 hunks)
✅ Files skipped from review due to trivial changes (1)
- slugger/README.md
🚧 Files skipped from review as they are similar to previous changes (1)
- slugger/EXAMPLES.md
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: addvanced
PR: kashifkhan0771/utils#177
File: slugger/slugger.go:39-43
Timestamp: 2025-08-12T07:38:35.407Z
Learning: In the slugger package, the user addvanced mentioned they have a separate branch that refactors the substitution logic from using strings.ReplaceAll in a loop to using strings.NewReplacer, which is a more appropriate and efficient approach for handling multiple string replacements.
Learnt from: addvanced
PR: kashifkhan0771/utils#184
File: conversion/helpers.go:0-0
Timestamp: 2025-08-17T10:28:12.553Z
Learning: User addvanced prefers pragmatic solutions that balance clean API design with backward compatibility. When dealing with helper functions that shouldn't have been exported, they're willing to use deprecated wrappers as a compromise while improving the internal implementation with more flexible, general-purpose functions.
Learnt from: addvanced
PR: kashifkhan0771/utils#184
File: conversion/helpers.go:0-0
Timestamp: 2025-08-17T10:28:12.553Z
Learning: User addvanced prefers pragmatic solutions that balance clean API design with backward compatibility. When dealing with helper functions that shouldn't have been exported, they're willing to use deprecated wrappers as a compromise while improving the internal implementation with more flexible, general-purpose functions.
📚 Learning: 2025-08-12T07:38:35.407Z
Learnt from: addvanced
PR: kashifkhan0771/utils#177
File: slugger/slugger.go:39-43
Timestamp: 2025-08-12T07:38:35.407Z
Learning: In the slugger package, the user addvanced mentioned they have a separate branch that refactors the substitution logic from using strings.ReplaceAll in a loop to using strings.NewReplacer, which is a more appropriate and efficient approach for handling multiple string replacements.
Applied to files:
slugger/slugger_test.goslugger/slugger.go
📚 Learning: 2025-08-11T04:49:26.949Z
Learnt from: addvanced
PR: kashifkhan0771/utils#171
File: strings/strings.go:0-0
Timestamp: 2025-08-11T04:49:26.949Z
Learning: In the `strings` package (specifically `strings/strings.go`), the `SubstringSearch` function now uses a Unicode-safe implementation for case-insensitive searches. It uses `strings.EqualFold` with rune-based window scanning to handle cases where lowercasing changes byte length (e.g., Turkish İ), ensuring correct index and substring extraction for international text.
Applied to files:
slugger/slugger.go
📚 Learning: 2025-08-21T09:50:42.184Z
Learnt from: addvanced
PR: kashifkhan0771/utils#189
File: slugger/slugger.go:136-144
Timestamp: 2025-08-21T09:50:42.184Z
Learning: In Danish, the ligature æ maps to "ae" and ø maps to "oe". These are the correct ASCII equivalents for Danish text normalization in the slugger package.
Applied to files:
slugger/slugger.go
🧬 Code graph analysis (1)
slugger/slugger_test.go (1)
slugger/slugger.go (1)
New(28-34)
🔇 Additional comments (5)
slugger/slugger.go (3)
27-33: Good: constructor copies caller map.copyMap in New avoids external mutation/data races and mirrors SetSubstitutions semantics. This is the right call.
136-159: LGTM: normalization pipeline is robust and focused.Ligature folding, NFKD decomposition, ASCII filtering, and Fields split are clean and predictable; tests cover the intended behavior.
71-72: No Go version gate needed—module and CI already target Go ≥1.22go.mod declares “go 1.24”, satisfying the requirement for cmp.Or (added in Go 1.22).
GitHub Actions workflows useactions/setup-go@v5withgo-version-file: 'go.mod', so they automatically install Go 1.24 as specified. No.gitlab-ci.ymlis present to adjust.All checks pass; no changes needed.
slugger/slugger_test.go (2)
7-93: Tests read well and cover core scenarios.Nice coverage: defaults, custom separators, emoji path, normalization, and dynamic substitution mutations. Using per-test Slugger instances plus t.Parallel is tidy.
210-212: No changes needed—Go 1.24 is already in useI’ve verified:
- go.mod declares
go 1.24- Both GitHub workflows (
.github/workflows/go.ymlandgolangci-lint.yml) useactions/setup-go@v5withgo-version-file: 'go.mod'Since CI picks up Go 1.24 from go.mod,
testing.B.Loop()at lines 210, 218, and 226 will compile without requiring fallbacks.
ee6cb70 to
d8a618b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
slugger/slugger_test.go (3)
3-11: Prep imports for a fail-fast, context-driven concurrency test.You’ll need context here if you switch TestSluggerConcurrentSubstitutions to errgroup.WithContext (see suggestion below). Also: the range-over-integers and b.Loop usages elsewhere are valid with go 1.24+, which you’ve set in go.mod. (tip.golang.org, pkg.go.dev)
Apply:
import ( - "fmt" + "context" + "fmt" "testing" "time" "unicode" "github.com/kashifkhan0771/utils/rand" "golang.org/x/sync/errgroup" )
213-329: Make the concurrency stress test fail fast and non-flaky (use errgroup.WithContext + timeout).Currently, if a reader detects an error early, other goroutines keep spinning until the 3s sleep elapses because the group has no cancellation path. Switching to a context-driven loop lets the test (a) stop immediately on error and (b) run for a bounded duration without explicit sleeps. Also avoids leaked goroutines if the test fails.
Note: “for range 8/4” is fine in Go ≥1.22, and b.Loop is fine in Go ≥1.24; your go.mod uses toolchain go1.24.6, which is compatible and includes fixes for early b.Loop issues in 1.24.0. (tip.golang.org, pkg.go.dev, github.com)
Apply:
func TestSluggerConcurrentSubstitutions(t *testing.T) { t.Parallel() @@ - var ( - eg = errgroup.Group{} - stop = make(chan struct{}) - ) + ctx, cancel := context.WithTimeout(context.Background(), 3*time.Second) + defer cancel() + eg, ctx := errgroup.WithContext(ctx) @@ - for range 8 { + for range 8 { eg.Go(func() error { for { - select { - case <-stop: - return nil - default: - } + select { + case <-ctx.Done(): + return nil + default: + } @@ - for range 4 { + for range 4 { eg.Go(func() error { keys := []string{"hello", "😀", "æ", "foo", "bar", "baz", "qux", "quux"} for { - select { - case <-stop: - return nil - default: - } + select { + case <-ctx.Done(): + return nil + default: + } @@ - // Run until deadline. - time.Sleep(3 * time.Second) - close(stop) - - // Wait for all goroutines to finish. - if err := eg.Wait(); err != nil { + // Wait for completion (either timeout or earlier failure). + if err := eg.Wait(); err != nil { t.Fatal(err) }Optional: derive the timeout from t.Deadline() when available to play nicer with -timeout and -race.
120-124: Minor: add a focused ReplaceSubstitution test.You exercise ReplaceSubstitution under load, but a deterministic unit test would catch subtle ordering/case-insensitivity bugs without relying on randomness. Example to add as another table row:
{ name: "ReplaceSubstitution updates value only", input: "Price is 10 %", separator: "-", substitutions: map[string]string{"%": "percent"}, substitutionsChange: map[string]string{"%": "pct"}, expected: "price-is-10-pct", }Happy to draft the exact subtest if you prefer a separate TestSlugger_ReplaceSubstitution.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (3)
go.mod(1 hunks)slugger/slugger.go(1 hunks)slugger/slugger_test.go(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- slugger/slugger.go
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: addvanced
PR: kashifkhan0771/utils#177
File: slugger/slugger.go:39-43
Timestamp: 2025-08-12T07:38:35.407Z
Learning: In the slugger package, the user addvanced mentioned they have a separate branch that refactors the substitution logic from using strings.ReplaceAll in a loop to using strings.NewReplacer, which is a more appropriate and efficient approach for handling multiple string replacements.
Learnt from: addvanced
PR: kashifkhan0771/utils#184
File: conversion/helpers.go:0-0
Timestamp: 2025-08-17T10:28:12.553Z
Learning: User addvanced prefers pragmatic solutions that balance clean API design with backward compatibility. When dealing with helper functions that shouldn't have been exported, they're willing to use deprecated wrappers as a compromise while improving the internal implementation with more flexible, general-purpose functions.
Learnt from: addvanced
PR: kashifkhan0771/utils#184
File: conversion/helpers.go:0-0
Timestamp: 2025-08-17T10:28:12.553Z
Learning: User addvanced prefers pragmatic solutions that balance clean API design with backward compatibility. When dealing with helper functions that shouldn't have been exported, they're willing to use deprecated wrappers as a compromise while improving the internal implementation with more flexible, general-purpose functions.
📚 Learning: 2025-08-11T04:49:26.949Z
Learnt from: addvanced
PR: kashifkhan0771/utils#171
File: strings/strings.go:0-0
Timestamp: 2025-08-11T04:49:26.949Z
Learning: In the `strings` package (specifically `strings/strings.go`), the `SubstringSearch` function now uses a Unicode-safe implementation for case-insensitive searches. It uses `strings.EqualFold` with rune-based window scanning to handle cases where lowercasing changes byte length (e.g., Turkish İ), ensuring correct index and substring extraction for international text.
Applied to files:
go.mod
📚 Learning: 2025-08-12T07:38:35.407Z
Learnt from: addvanced
PR: kashifkhan0771/utils#177
File: slugger/slugger.go:39-43
Timestamp: 2025-08-12T07:38:35.407Z
Learning: In the slugger package, the user addvanced mentioned they have a separate branch that refactors the substitution logic from using strings.ReplaceAll in a loop to using strings.NewReplacer, which is a more appropriate and efficient approach for handling multiple string replacements.
Applied to files:
slugger/slugger_test.go
🧬 Code graph analysis (1)
slugger/slugger_test.go (2)
slugger/slugger.go (1)
New(28-34)rand/rand.go (2)
Pick(77-89)NumberInRange(42-64)
🔇 Additional comments (5)
go.mod (1)
13-16: Commit the updated go.mod after runninggo mod tidyThe
go mod tidyexecution has already:
- Added
golang.org/x/sync v0.16.0into the primaryrequireblock (dropping the// indirectannotation).- Removed the separate indirect-only
requireblock forx/syncand consolidated the module requirements.No further manual edits are needed—please commit the updated
go.mod.slugger/slugger_test.go (4)
66-73: Nice coverage for SetSubstitutions semantics.This validates whole-map replacement and case-insensitive keys. Good guard for API regressions in the new substitutions API.
75-91: Good targeted tests for Add/Remove path.Exercising AddSubstitution and RemoveSubstitution separately helps catch ordering and cache invalidation bugs in the replacer build path. Well done.
179-184: Great “longest-match wins” assertion.This will catch any regression in key ordering (e.g., processing “&” before “&&”). It aligns with the move to strings.Replacer and sorted keys.
331-353: Adopting testing.B.Loop is the right call for Go 1.24+.b.Loop is purpose-built to avoid benchmark gotchas (timer management, DCE) and should be preferred over b.N. Your go.mod’s toolchain (go1.24.6) is sufficient. (pkg.go.dev)
If you haven’t already, consider adding b.ReportAllocs() to one benchmark to track allocator effects of the new replacer path.
6856be9 to
654ff48
Compare
|
@kashifkhan0771 I feel like this should be ready as well now. :) Let's see if the bunny rabbit has any input - if not, then all is good from here. :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
slugger/slugger.go (1)
139-146: Fix comment: œ maps to “oe”, not “ae”. Optionally include œ in the table.The example currently says “œ → ae”. Correct mapping is “œ → oe”. Also consider actually adding “œ” to ligatureReplacer for completeness (NFKD typically decomposes it, but being explicit is clearer).
Apply:
-// Add more if you run into them (e.g. œ → ae, ß → ss). +// Add more if you run into them (e.g. œ → oe, ß → ss).Optionally:
var ligatureReplacer = strings.NewReplacer( "æ", "ae", "ø", "oe", "ß", "ss", + "œ", "oe", )
🧹 Nitpick comments (5)
slugger/slugger.go (3)
53-61: Avoid redundant rebuilds under contention (double-check under lock).Today, multiple Slug callers seeing r == nil will all call initReplacer. Each will take the lock and rebuild even if another goroutine already built it moments earlier. It’s safe but wasteful. Add an early check under the write lock and remove the internal “set to nil” to prevent unnecessary rebuilds.
Apply:
func (sl *Slugger) initReplacer() { - sl.mu.Lock() - defer sl.mu.Unlock() - - // Reset the replacer to nil so that it can be rebuilt with the latest substitutions - sl.replacer = nil + sl.mu.Lock() + // Another goroutine might have already built it between the RUnlock and this Lock. + if sl.replacer != nil { + sl.mu.Unlock() + return + } + defer sl.mu.Unlock() if len(sl.Substitutions) == 0 { return }
203-209: Comparator allocates on every compare; precompute rune lengths.len([]rune(a.k)) allocates; it runs many times during sort. Precompute rune counts once when building subsPairs and compare those ints.
Apply:
@@ - // struct to hold key-value pairs for sorting - type subsKV struct { - k, v string - } + // struct to hold key-value pairs plus precomputed rune length for sorting + type subsKV struct { + k, v string + n int + } @@ - for k, v := range sl.Substitutions { + for k, v := range sl.Substitutions { if k == "" { continue } - subsPairs = append(subsPairs, subsKV{k: k, v: strings.ToLower(v)}) + subsPairs = append(subsPairs, subsKV{ + k: k, + v: strings.ToLower(v), + n: utf8.RuneCountInString(k), + }) } @@ - slices.SortFunc(subsPairs, func(a, b subsKV) int { - if la, lb := len([]rune(a.k)), len([]rune(b.k)); la != lb { - return cmp.Compare(lb, la) // sort by key length DESC - } - return cmp.Compare(a.k, b.k) - }) + slices.SortFunc(subsPairs, func(a, b subsKV) int { + if a.n != b.n { + return cmp.Compare(b.n, a.n) // sort by key length DESC + } + return cmp.Compare(a.k, b.k) + })And add the import:
import ( "cmp" "slices" "strings" "sync" "unicode" + "unicode/utf8"
211-214: Intentional leading space in replacement values — document this quirk.Building the replacer as ("key", " "+value) guarantees token separation when a key appears mid-word but subtly changes semantics vs a plain substitution. It’s a good trick; call it out in the public docs/comments so users aren’t surprised by leading-space semantics.
I can propose a short README note if you’d like.
slugger/slugger_test.go (2)
335-357: b.Loop requires a very recent Go; add fallback to for-loop if needed.The new testing.B API is not available in older toolchains. If CI/tooling isn’t pinned, swap to the canonical for i := 0; i < b.N; i++ form.
Run the same go.mod check as above. If not pinned, here’s a fallback:
func BenchmarkSlugger_Slug(b *testing.B) { sl := New(map[string]string{"&": "and"}, false) - - for b.Loop() { - sl.Slug("Wôrķšpáçè ~~sèťtïñğš~~", "") - } + for i := 0; i < b.N; i++ { + _ = sl.Slug("Wôrķšpáçè ~~sèťtïñğš~~", "") + } } @@ func BenchmarkSlugger_Slug_WithEmoji(b *testing.B) { sl := New(map[string]string{"&": "and"}, true) - - for b.Loop() { - sl.Slug("a 😺, 🐈⬛, and a 🦁 go to 🏞️", "") - } + for i := 0; i < b.N; i++ { + _ = sl.Slug("a 😺, 🐈⬛, and a 🦁 go to 🏞️", "") + } } @@ func BenchmarkSlugger_Slug_CustomSeparator(b *testing.B) { sl := New(map[string]string{"&": "and"}, false) - - for b.Loop() { - sl.Slug("Wôrķšpáçè ~~sèťtïñğš~~", "|") - } + for i := 0; i < b.N; i++ { + _ = sl.Slug("Wôrķšpáçè ~~sèťtïñğš~~", "|") + } }
151-179: Confirm expected ligature mappings align with product goals.The test asserts æ→ae and ø→oe (per prior learnings about Danish mappings). This is correct for Danish and matches the code path. Just calling out that this is language-specific; for other locales you might prefer stripping to a/o. If broader locale handling is planned later, a note in docs would help.
I used your earlier learning about Danish mappings to validate this behavior.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (3)
go.mod(1 hunks)slugger/slugger.go(1 hunks)slugger/slugger_test.go(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- go.mod
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: addvanced
PR: kashifkhan0771/utils#177
File: slugger/slugger.go:39-43
Timestamp: 2025-08-12T07:38:35.407Z
Learning: In the slugger package, the user addvanced mentioned they have a separate branch that refactors the substitution logic from using strings.ReplaceAll in a loop to using strings.NewReplacer, which is a more appropriate and efficient approach for handling multiple string replacements.
Learnt from: addvanced
PR: kashifkhan0771/utils#184
File: conversion/helpers.go:0-0
Timestamp: 2025-08-17T10:28:12.553Z
Learning: User addvanced prefers pragmatic solutions that balance clean API design with backward compatibility. When dealing with helper functions that shouldn't have been exported, they're willing to use deprecated wrappers as a compromise while improving the internal implementation with more flexible, general-purpose functions.
Learnt from: addvanced
PR: kashifkhan0771/utils#184
File: conversion/helpers.go:0-0
Timestamp: 2025-08-17T10:28:12.553Z
Learning: User addvanced prefers pragmatic solutions that balance clean API design with backward compatibility. When dealing with helper functions that shouldn't have been exported, they're willing to use deprecated wrappers as a compromise while improving the internal implementation with more flexible, general-purpose functions.
📚 Learning: 2025-08-12T07:38:35.407Z
Learnt from: addvanced
PR: kashifkhan0771/utils#177
File: slugger/slugger.go:39-43
Timestamp: 2025-08-12T07:38:35.407Z
Learning: In the slugger package, the user addvanced mentioned they have a separate branch that refactors the substitution logic from using strings.ReplaceAll in a loop to using strings.NewReplacer, which is a more appropriate and efficient approach for handling multiple string replacements.
Applied to files:
slugger/slugger.goslugger/slugger_test.go
📚 Learning: 2025-08-11T04:49:26.949Z
Learnt from: addvanced
PR: kashifkhan0771/utils#171
File: strings/strings.go:0-0
Timestamp: 2025-08-11T04:49:26.949Z
Learning: In the `strings` package (specifically `strings/strings.go`), the `SubstringSearch` function now uses a Unicode-safe implementation for case-insensitive searches. It uses `strings.EqualFold` with rune-based window scanning to handle cases where lowercasing changes byte length (e.g., Turkish İ), ensuring correct index and substring extraction for international text.
Applied to files:
slugger/slugger.go
📚 Learning: 2025-08-21T09:50:42.184Z
Learnt from: addvanced
PR: kashifkhan0771/utils#189
File: slugger/slugger.go:136-144
Timestamp: 2025-08-21T09:50:42.184Z
Learning: In Danish, the ligature æ maps to "ae" and ø maps to "oe". These are the correct ASCII equivalents for Danish text normalization in the slugger package.
Applied to files:
slugger/slugger.go
🧬 Code graph analysis (1)
slugger/slugger_test.go (2)
slugger/slugger.go (1)
New(28-34)rand/rand.go (2)
Pick(77-89)NumberInRange(42-64)
🔇 Additional comments (7)
slugger/slugger.go (4)
23-25: Good concurrency story for shared state.Guarding both Substitutions and the cached replacer with an RWMutex is the right call; reading r with RLock and rebuilding under write lock balances safety with fast-path performance.
27-33: Copy-on-construct prevents external mutation/races.New copies the caller’s map (mirrors SetSubstitutions behavior). This removes a whole class of heisenbugs where a caller mutates the map during Slug.
74-115: Thread-safe mutators are correct and minimal.Add/Remove/Replace/Set all acquire mu.Lock, update Substitutions, and nil the replacer. This matches the lazy rebuild model and keeps Slug lock-free on the hot path.
71-72: No action needed: Go version ≥1.24 is pinned
The module’s go.mod declaresgo 1.24and includes atoolchain go1.24.6directive, socmp.Or(added in Go 1.22+) is fully supported in CI and tooling. You can safely keep the existing code.slugger/slugger_test.go (3)
141-207: Edge-case coverage is excellent.Covers empty/whitespace inputs, safe punctuation, diacritic stripping, case-insensitive substitutions, longest-match precedence, emoji off-path, and custom separators. This gives strong confidence in the new pipeline.
222-333: Concurrency stress test is meaningful and balanced.The readers/writers mix, errgroup with timeout, and ASCII/space assertions are pragmatic and should catch common race regressions around substitutions/replacer. Nicely done.
111-116: Go version pinned to 1.24 – no shadowing neededThe module’s go.mod specifies
- go 1.24
- toolchain go1.24.6
So Go’s per-iteration variable semantics (introduced in 1.22) ensure
ttis safely captured in each parallel subtest. No additionaltt := ttshadowing is required.
654ff48 to
37c7dd1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
slugger/slugger.go (1)
140-147: Typo in comment: œ maps to “oe”, not “ae”.This example is incorrect and may confuse readers. Code is fine; fix the comment.
Apply:
-// Add more if you run into them (e.g. œ → ae, ß → ss). +// Add more if you run into them (e.g. œ → oe, ß → ss).
🧹 Nitpick comments (3)
slugger/slugger.go (3)
199-210: Sort by the actual match key length (post-lowercasing) to ensure true longest-match precedence.You lower-case the input and the replacer keys; however, n is computed on the original key. For edge cases where lowercasing changes rune length, ordering could be suboptimal. Store lowercased k/v upfront, compute n on lowerK, and avoid re-lowering later.
Apply:
- subsPairs := make([]subsKV, 0, len(sl.Substitutions)) - for k, v := range sl.Substitutions { + subsPairs := make([]subsKV, 0, len(sl.Substitutions)) + for k, v := range sl.Substitutions { if k == "" { continue } - subsPairs = append(subsPairs, - subsKV{ - k: k, - v: strings.ToLower(v), - n: utf8.RuneCountInString(k), - }) + lowerK := strings.ToLower(k) + lowerV := strings.ToLower(v) + subsPairs = append(subsPairs, subsKV{ + k: lowerK, + v: lowerV, + n: utf8.RuneCountInString(lowerK), + }) } @@ - subs := make([]string, 0, len(subsPairs)*2) - for _, sub := range subsPairs { - subs = append(subs, strings.ToLower(sub.k), " "+sub.v) - } + subs := make([]string, 0, len(subsPairs)*2) + for _, sub := range subsPairs { + subs = append(subs, sub.k, " "+sub.v) + }Also applies to: 216-222, 224-227
186-189: Minor: redundant nil assignment.Inside initReplacer, sl.replacer is guaranteed to be nil here due to the early return on Line 179–183. You can drop the extra assignment for clarity.
- // Reset the replacer to nil so that it can be rebuilt with the latest substitutions - sl.replacer = nil + // Rebuild with latest substitutions
80-86: Nit: the nil check on Substitutions is unnecessary after New/SetSubstitutions.copyMap returns a non-nil map, so Substitutions will not be nil in normal use. This is harmless, but you could simplify.
- if sl.Substitutions == nil { - sl.Substitutions = make(map[string]string) - } + // Substitutions is non-nil after New/SetSubstitutions; keep for defensive coding if desired.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (3)
go.mod(1 hunks)slugger/slugger.go(1 hunks)slugger/slugger_test.go(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- go.mod
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: addvanced
PR: kashifkhan0771/utils#177
File: slugger/slugger.go:39-43
Timestamp: 2025-08-12T07:38:35.407Z
Learning: In the slugger package, the user addvanced mentioned they have a separate branch that refactors the substitution logic from using strings.ReplaceAll in a loop to using strings.NewReplacer, which is a more appropriate and efficient approach for handling multiple string replacements.
Learnt from: addvanced
PR: kashifkhan0771/utils#184
File: conversion/helpers.go:0-0
Timestamp: 2025-08-17T10:28:12.553Z
Learning: User addvanced prefers pragmatic solutions that balance clean API design with backward compatibility. When dealing with helper functions that shouldn't have been exported, they're willing to use deprecated wrappers as a compromise while improving the internal implementation with more flexible, general-purpose functions.
Learnt from: addvanced
PR: kashifkhan0771/utils#184
File: conversion/helpers.go:0-0
Timestamp: 2025-08-17T10:28:12.553Z
Learning: User addvanced prefers pragmatic solutions that balance clean API design with backward compatibility. When dealing with helper functions that shouldn't have been exported, they're willing to use deprecated wrappers as a compromise while improving the internal implementation with more flexible, general-purpose functions.
📚 Learning: 2025-08-12T07:38:35.407Z
Learnt from: addvanced
PR: kashifkhan0771/utils#177
File: slugger/slugger.go:39-43
Timestamp: 2025-08-12T07:38:35.407Z
Learning: In the slugger package, the user addvanced mentioned they have a separate branch that refactors the substitution logic from using strings.ReplaceAll in a loop to using strings.NewReplacer, which is a more appropriate and efficient approach for handling multiple string replacements.
Applied to files:
slugger/slugger_test.goslugger/slugger.go
📚 Learning: 2025-08-11T04:49:26.949Z
Learnt from: addvanced
PR: kashifkhan0771/utils#171
File: strings/strings.go:0-0
Timestamp: 2025-08-11T04:49:26.949Z
Learning: In the `strings` package (specifically `strings/strings.go`), the `SubstringSearch` function now uses a Unicode-safe implementation for case-insensitive searches. It uses `strings.EqualFold` with rune-based window scanning to handle cases where lowercasing changes byte length (e.g., Turkish İ), ensuring correct index and substring extraction for international text.
Applied to files:
slugger/slugger.go
📚 Learning: 2025-08-21T09:50:42.184Z
Learnt from: addvanced
PR: kashifkhan0771/utils#189
File: slugger/slugger.go:136-144
Timestamp: 2025-08-21T09:50:42.184Z
Learning: In Danish, the ligature æ maps to "ae" and ø maps to "oe". These are the correct ASCII equivalents for Danish text normalization in the slugger package.
Applied to files:
slugger/slugger.go
🧬 Code graph analysis (1)
slugger/slugger_test.go (2)
slugger/slugger.go (1)
New(29-35)rand/rand.go (2)
Pick(77-89)NumberInRange(42-64)
🔇 Additional comments (6)
slugger/slugger.go (2)
54-65: Good synchronization on replacer build and mutation.
- Reads are under RLock with a lazy init path.
- initReplacer() is write-locked, and mutators lock and nil the cache.
- This removes the prior sync.Once data race risk and makes Slug safe under concurrent readers/writers.
Also applies to: 176-185, 75-86, 88-101, 103-116, 118-125
72-73: Ensure Go toolchain compatibility forcmp.Or(Go ≥1.22)
- go.mod declares
go 1.24and atoolchain go1.24.6, so the use ofcmp.Or(added in Go 1.22) is fully supported by your module settings.- I didn’t find any CI workflow files in
.github/workflows/*.yml. Please confirm that your CI environment (GitHub Actions, Travis, CircleCI, etc.) is configured to use Go 1.24 or later so thatcmp.Oris available during automated builds and tests.slugger/slugger_test.go (4)
113-116: Parallel subtests: nice touch.Good use of t.Parallel for independent cases; matches the concurrency-safe design in slugger.go.
Also applies to: 210-214
181-193: Great: validates longest-match substitution precedence.This test will catch regressions where shorter keys (&) are applied before longer ones (&&).
169-179: ASCII normalization coverage looks solid.Good to see explicit checks for æ/ø/ß and NFKD diacritics. This aligns with ligatureReplacer and normalizeWordsToSafeASCII.
268-292: Go version requirements satisfied
- go.mod declares
go 1.24(line 3), meeting the minimum required toolchain version.- Both
.github/workflows/go.ymland.github/workflows/golangci-lint.ymluseactions/setup-go@v5withgo-version-file: 'go.mod', so CI will automatically pick up Go 1.24.No further changes to the module directive or CI configuration are needed.
37c7dd1 to
bac8092
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
slugger/slugger.go (1)
173-180: Consider the race condition if replacer is modified between checksThere's a potential race where the replacer could be set by another goroutine between line 175's check and line 176's unlock. While the current implementation is safe because
initReplaceruses proper locking and multiple initializations are harmless, this pattern could be cleaner.Consider simplifying to avoid the early return pattern:
func (sl *Slugger) initReplacer() { sl.mu.Lock() - - if sl.replacer != nil { - sl.mu.Unlock() - - return - } defer sl.mu.Unlock() + + // Recheck under lock in case another goroutine already initialized + if sl.replacer != nil { + return + }slugger/slugger_test.go (1)
308-309: Minor: timestamp format could be more readableThe timestamp format
"150405.000"generates values like "150405.000" which might not be immediately recognizable as a time. Consider using a more standard format for debugging.Consider using a more readable timestamp format:
- k := pick(keys) + time.Now().Format("150405.000") + k := pick(keys) + time.Now().Format("15:04:05.000")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (3)
go.mod(1 hunks)slugger/slugger.go(1 hunks)slugger/slugger_test.go(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- go.mod
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: addvanced
PR: kashifkhan0771/utils#177
File: slugger/slugger.go:39-43
Timestamp: 2025-08-12T07:38:35.407Z
Learning: In the slugger package, the user addvanced mentioned they have a separate branch that refactors the substitution logic from using strings.ReplaceAll in a loop to using strings.NewReplacer, which is a more appropriate and efficient approach for handling multiple string replacements.
Learnt from: addvanced
PR: kashifkhan0771/utils#184
File: conversion/helpers.go:0-0
Timestamp: 2025-08-17T10:28:12.553Z
Learning: User addvanced prefers pragmatic solutions that balance clean API design with backward compatibility. When dealing with helper functions that shouldn't have been exported, they're willing to use deprecated wrappers as a compromise while improving the internal implementation with more flexible, general-purpose functions.
Learnt from: addvanced
PR: kashifkhan0771/utils#184
File: conversion/helpers.go:0-0
Timestamp: 2025-08-17T10:28:12.553Z
Learning: User addvanced prefers pragmatic solutions that balance clean API design with backward compatibility. When dealing with helper functions that shouldn't have been exported, they're willing to use deprecated wrappers as a compromise while improving the internal implementation with more flexible, general-purpose functions.
📚 Learning: 2025-08-12T07:38:35.407Z
Learnt from: addvanced
PR: kashifkhan0771/utils#177
File: slugger/slugger.go:39-43
Timestamp: 2025-08-12T07:38:35.407Z
Learning: In the slugger package, the user addvanced mentioned they have a separate branch that refactors the substitution logic from using strings.ReplaceAll in a loop to using strings.NewReplacer, which is a more appropriate and efficient approach for handling multiple string replacements.
Applied to files:
slugger/slugger.goslugger/slugger_test.go
📚 Learning: 2025-08-21T09:50:42.184Z
Learnt from: addvanced
PR: kashifkhan0771/utils#189
File: slugger/slugger.go:136-144
Timestamp: 2025-08-21T09:50:42.184Z
Learning: In Danish, the ligature æ maps to "ae" and ø maps to "oe". These are the correct ASCII equivalents for Danish text normalization in the slugger package.
Applied to files:
slugger/slugger.go
🧬 Code graph analysis (1)
slugger/slugger_test.go (2)
slugger/slugger.go (1)
New(29-35)rand/rand.go (2)
Pick(77-89)NumberInRange(42-64)
🔇 Additional comments (6)
slugger/slugger.go (4)
28-35: Good call usingcopyMapto prevent external mutations!I see from our past discussions that you've addressed the data race concern by copying the input map. This ensures the Slugger owns its substitutions and prevents external callers from causing race conditions.
54-65: Proper concurrency control with RWMutex!The double-checked pattern with
RLockfor the fast path and lazy initialization works well here. The read lock protects against concurrent reads while mutators are operating, and theinitReplacerproperly handles write locking internally.
139-143: Excellent ligature handling!The mappings are linguistically correct - æ→ae and ø→oe for Danish text normalization. Using a pre-built
strings.Replacerfor these common ligatures is efficient.
219-220: I couldn’t find any tests or documentation explaining why a leading space is prepended to each substitution value, and the quick scan of nearby code didn’t reveal an obvious rationale. To understand whether this is intentional—for example, to ensure word boundaries when replacing—or an artifact that could be removed, I’ve added a deeper inspection script. Once you’ve confirmed the intent (or pointed me to existing docs/tests), I can advise whether we should remove the extra space or document its necessity.slugger/slugger_test.go (2)
222-333: Comprehensive concurrency testing!Excellent stress test with 8 concurrent readers and 4 writers exercising all mutation methods. The timeout context and ASCII/space validation provide good safety checks. Using
errgroupfor coordinated error handling is a best practice.
187-193: Smart test for longest-match precedence!This test case effectively validates that the sorting by length (descending) in
initReplacerworks correctly. Without proper ordering, "&&" would never match if "&" was processed first.
|
I am super busy these days. Once I get a chance I will review this PR. |
|
@addvanced when you get a chance please resolve the conflicts. I'll try to ship this by the end of week. |
Description:
Added new functionality:
Refactored how Slugger is initialized
Replaced strings.ReplaceAll()-loop with strings.Replacer.
Added a ligatureReplacer map for some manual character replacement (e.g. ø = oe)
Better handling of keys, that are now sorted ascending and by word length, so e.g.
&&is dealt with before&etc.Other minor cleanups here and there
Checklist:
make test.make lint.If added a new utility function or updated existing function logic:
EXAMPLES.mdREADME.mdIf added a new utility package:
README.mdutility packages tableSummary by CodeRabbit
New Features
Documentation
Tests
Chores