Languages: English | 简体中文 | 日本語 | Español | Français
Atomic operations with explicit memory ordering for Go.
Go's sync/atomic provides atomic operations with sequential consistency. This library exposes C++11/C11 memory model orderings (Relaxed, Acquire, Release, AcqRel) through architecture-specific implementations.
import "code.hybscloud.com/atomix"
var counter atomix.Int64
// Method-based API with ordering suffix
counter.AddRelaxed(1) // Relaxed: no synchronization
counter.Add(1) // AcqRel: default safe ordering
// Pointer-based API for raw memory
var flags int32
atomix.Relaxed.StoreInt32(&flags, 1)
val := atomix.Acquire.LoadInt32(&flags)go get code.hybscloud.com/atomixRequirements: Go 1.25+
The library implements four orderings from the C++11 memory model:
| Ordering | Semantics |
|---|---|
| Relaxed | Atomicity only. No synchronization or ordering constraints. |
| Acquire | Subsequent reads/writes cannot be reordered before this load. Pairs with Release stores. |
| Release | Prior reads/writes cannot be reordered after this store. Pairs with Acquire loads. |
| AcqRel | Combines Acquire and Release semantics. For read-modify-write operations. |
Default methods (no ordering suffix) use:
- Load operations: Relaxed
- Store operations: Relaxed
- Read-modify-write operations: AcqRel
Note: sync/atomic uses acquire for Load and release for Store (sequential consistency on x86). atomix defaults to Relaxed for maximum performance on weakly-ordered architectures. Use LoadAcquire/StoreRelease when sync/atomic-equivalent ordering is required.
| Use Case | Ordering | Rationale |
|---|---|---|
| Statistics counters | Relaxed | No synchronization needed; eventual consistency acceptable |
| Reference counting | AcqRel | Ensures visibility of object state before deallocation |
| Producer-consumer flags | Release/Acquire | Producer releases data, consumer acquires |
| Spinlock acquire | Acquire | Critical section reads must see prior writes |
| Spinlock release | Release | Critical section writes must complete before unlock |
| Sequence locks | AcqRel | Both directions need ordering |
| Type | Size | Description |
|---|---|---|
Bool |
4 bytes | Atomic boolean (backed by uint32) |
Int32, Uint32 |
4 bytes | 32-bit integers |
Int64, Uint64 |
8 bytes | 64-bit integers |
Uintptr |
8 bytes | Pointer-sized integer |
Pointer[T] |
8 bytes | Generic atomic pointer |
Int128, Uint128 |
16 bytes | 128-bit integers (requires 16-byte alignment) |
Padded variants (Int64Padded, Uint64Padded, etc.) occupy a full cache line (64 bytes) to prevent false sharing when multiple atomic variables are accessed by different CPU cores.
// Without padding: variables may share cache line, causing contention
var a, b atomix.Int64 // May be adjacent in memory
// With padding: each variable occupies its own cache line
var a, b atomix.Int64Padded // 64-byte separation guaranteed| Operation | Returns | Description |
|---|---|---|
Load |
value | Atomic read |
Store |
— | Atomic write |
Swap |
old value | Atomic exchange |
CompareAndSwap |
bool | Returns true if exchange occurred |
CompareExchange |
old value | Returns previous value regardless of success |
Add, Sub |
new value | Atomic arithmetic |
Inc, Dec |
new value | Atomic increment/decrement by 1 |
And, Or, Xor |
old value | Atomic bitwise operations |
Max, Min |
old value | Atomic maximum/minimum |
Return value semantics: Add/Sub/Inc/Dec return the new value (like sync/atomic). Swap/And/Or/Xor/Max/Min return the old value.
// CompareAndSwap: returns success/failure
if v.CompareAndSwap(old, new) {
// Success
}
// CompareExchange: returns previous value (enables CAS loops without separate Load)
for {
old := v.Load()
new := transform(old)
if v.CompareExchange(old, new) == old {
break // Success
}
}For interoperation with memory-mapped regions, shared memory, or io_uring rings:
var flags int32
atomix.Relaxed.StoreInt32(&flags, 1)
val := atomix.Acquire.LoadInt32(&flags)
atomix.Release.CompareAndSwapInt32(&flags, 0, 1)The pointer-based API operates on raw *int32, *int64, etc., rather than wrapper types. This is useful when atomic variables cannot use wrapper types (e.g., fields in kernel-shared structures).
128-bit atomics require 16-byte alignment. Use placement helpers for shared memory:
buf := make([]byte, 32)
_, ptr := atomix.PlaceAlignedUint128(buf, 0)
ptr.Store(lo, hi)
var v atomix.Uint128 // Type ensures alignment
v.Store(lo, hi)| Architecture | 128-bit Implementation |
|---|---|
| amd64 | LOCK CMPXCHG16B |
| arm64 | LDXP/STXP (default) or CASP (-tags=lse2) |
| riscv64, loong64 | Spinlock emulation (LL/SC on low 64 bits) |
Note: 128-bit atomics are primarily useful for double-word CAS patterns (e.g., lock-free data structures with version counters).
x86-64 provides Total Store Ordering (TSO), a strong memory model where:
- All loads have implicit acquire semantics
- All stores have implicit release semantics
- Store-load ordering requires explicit barrier (MFENCE) or locked instruction
Consequently, all ordering variants compile to identical machine code on x86-64. The primary benefit of explicit ordering on x86-64 is documentation and portability.
| Operation | Instruction | Notes |
|---|---|---|
| Load | MOV |
Plain memory access |
| Store | MOV |
Plain memory access |
| Add | LOCK XADD |
Returns old value |
| Swap | XCHG |
Implicit LOCK |
| CAS | LOCK CMPXCHG |
|
| And/Or/Xor | LOCK CMPXCHG loop |
Returns old value via CAS loop |
| CAS128 | LOCK CMPXCHG16B |
Load and Store are implemented in pure Go for compiler inlining.
ARM64 has a weakly ordered memory model requiring explicit ordering instructions. LSE (Large System Extensions) provides atomic instructions with ordering suffixes:
Suffix meanings: No suffix = Relaxed, A = Acquire, L = Release, AL = Acquire-Release
| Operation | Relaxed | Acquire | Release | AcqRel |
|---|---|---|---|---|
| Load | LDR |
LDAR |
— | — |
| Store | STR |
— | STLR |
— |
| Add | LDADD |
LDADDA |
LDADDL |
LDADDAL |
| CAS | CAS |
CASA |
CASL |
CASAL |
| Swap | SWP |
SWPA |
SWPL |
SWPAL |
| And | LDCLR† |
LDCLRA |
LDCLRL |
LDCLRAL |
| Or | LDSET |
LDSETA |
LDSETL |
LDSETAL |
| Xor | LDEOR |
LDEORA |
LDEORL |
LDEORAL |
† LDCLR clears bits (AND with complement). To implement And(mask), pass ~mask.
Relaxed load/store are implemented in pure Go for inlining. Other orderings use assembly with LSE instructions.
| Build Tag | Instructions | Target Hardware |
|---|---|---|
| (default) | LDXP/STXP (LL/SC loop) |
All ARMv8+ |
-tags=lse2 |
CASP (single instruction) |
ARMv8.4+ with LSE2 |
LL/SC (Load-Link/Store-Conditional) retries on contention. CASP provides single-instruction atomicity but requires newer hardware.
RISC-V RVWMO (Weak Memory Ordering) uses explicit fence instructions:
| Operation | Implementation |
|---|---|
| Load Relaxed | LD |
| Load Acquire | LD + FENCE R,RW |
| Store Relaxed | SD |
| Store Release | FENCE RW,W + SD |
| RMW | AMO instructions with .aq/.rl modifiers |
128-bit operations use spinlock-based emulation.
LoongArch uses DBAR (data barrier) instructions:
| Operation | Implementation |
|---|---|
| Load Relaxed | LD.D |
| Load Acquire | LD.D + DBAR |
| Store Relaxed | ST.D |
| Store Release | DBAR + ST.D |
| RMW | AM*_DB instructions |
128-bit operations use spinlock-based emulation.
Unsupported architectures use sync/atomic, which provides sequential consistency. 128-bit operations on fallback architectures are not atomic (two separate 64-bit operations).
- Performance on weak architectures: ARM64/RISC-V can use weaker (faster) instructions when full ordering isn't needed
- Documentation: Ordering suffix documents synchronization intent
- Portability: Code explicitly specifies requirements rather than relying on architecture-specific guarantees
- Correctness: Makes memory ordering decisions explicit and reviewable
sync/atomic provides sequential consistency, which is:
- Sufficient for most use cases
- Portable across all architectures
- Simple to reason about
Use atomix when:
- Building high-performance lock-free data structures
- Interoperating with kernel or hardware interfaces (io_uring, shared memory)
- Porting C/C++ code with explicit memory ordering
- Targeting ARM64/RISC-V where weaker ordering provides measurable benefit
| Platform | Implementation |
|---|---|
| linux/amd64 | Native assembly |
| linux/arm64 | Native assembly with LSE |
| linux/riscv64 | Native assembly (128-bit emulated) |
| linux/loong64 | Native assembly (128-bit emulated) |
| darwin/amd64, darwin/arm64 | Native assembly |
| freebsd/amd64, freebsd/arm64 | Native assembly |
| Other | sync/atomic fallback |
To bring out full performance, atomix can be integrated with the Go compiler to emit inline atomic instructions, eliminating function call overhead. See intrinsics.md for the implementation approach.
MIT — see LICENSE.
©2026 Hayabusa Cloud Co., Ltd.