GC Mutator and Write Barriers¶
The term "mutator" refers to application code that modifies heap state—allocating objects, assigning pointers, and modifying data structures. In a concurrent garbage collector, mutators run simultaneously with the GC, creating a fundamental challenge: how does the GC see changes made while it's marking?
The answer is write barriers, runtime-instrumented code that tracks pointer modifications. This article explores Go's write barrier implementation in src/runtime/mbarrier.go.
Why Write Barriers Matter¶
Go's GC marks most objects concurrently while application code runs. Without write barriers, the following scenario would cause catastrophic failures:
// Time T1: GC scans Object A, marks it black
// A.ptr = B (B is white/unmarked)
// Time T2: Application assigns (without write barrier)
// A.ptr = C // C is still white!
// Time T3: B becomes unreachable elsewhere, GC never scans C
// GC concludes C is white = dead
// Time T4: GC collects C, but A.ptr still points to it!
// Application crashes when dereferencing A.ptr
Write barriers prevent this by shading (marking grey) any objects involved in pointer assignments during GC.
The Design: Why Write, Not Read?¶
Go implements write barriers only, avoiding read barriers entirely. This design choice is critical:
- Read barrier: Every pointer read would check GC state, adding overhead to all pointer dereferences
- Write barrier: Only pointer assignments trigger extra logic
Why read barriers are unnecessary:
Go's Dijkstra-Yuasa hybrid write barrier maintains an invariant that makes read barriers redundant. Specifically, it ensures that any object referenced by a black (scanned) object is either: 1. Also black (already scanned), or 2. Grey (queued for scanning)
This invariant means reads never need to re-check object liveness—only writes need to update tracking.
Write Barrier Implementation¶
Compiler Insertion¶
The Go compiler inserts write barrier calls at every pointer store operation. This happens before code generation:
Multiple barrier variants exist due to compiler optimizations:
$ go tool nm myapp | grep gcWriteBarrier
4943c0 t gcWriteBarrier
496c80 t runtime.gcWriteBarrier1
496ca0 t runtime.gcWriteBarrier2
# ... more variants
Each variant is optimized for specific assignment patterns (e.g., assigning during allocation, during store operations, etc.).
Assembly Implementation¶
The write barrier is implemented in assembly for performance (example from go1.23.12/src/runtime/asm_amd64.s):
TEXT gcWriteBarrier<>(SB),NOSPLIT,$112
// Save registers that will be clobbered
MOVQ AX, 0(SP)
MOVQ BX, 8(SP)
# ... more register saves
// Get current P's write buffer
MOVQ (TLS), AX // Get g (goroutine)
MOVQ g_m(AX), AX // g.m
MOVQ m_p(AX), AX // m.p
LEAQ p_wbBuf(AX), BX // &p.wbBuf
// Add old and new pointers to buffer
MOVQ old_ptr, CX
MOVQ new_ptr, DX
# ... buffer insertion logic
// Check if buffer full
CMPQ buf_pos, buf_end
JNE done
// Buffer full: flush to GC worker queue
CALL wbBufFlush(SB)
done:
// Restore registers
MOVQ 0(SP), AX
# ... more restores
RET
Key operations:
- Register preservation: Save/restore caller-saved registers (20-50ns overhead)
- Buffer insertion: Add old+new pointers to per-P buffer
- Flush on full: If buffer full, synchronously flush to GC work queue
Performance Characteristics¶
Write barrier cost varies dramatically based on GC state:
| Scenario | Cost | Explanation |
|---|---|---|
| GC inactive | ~5-10ns | Barrier checks a flag and quickly returns |
| GC active, buffer not full | ~20-50ns | Register save/restore + buffer insertion |
| GC active, buffer full | ~200-1000ns | Includes wbBufFlush call (cache pollution) |
Why the cost is worth it:
The expensive flush case (200-1000ns) prevents far more expensive STW pauses. Without write barriers: - GC would need to rescan all modified objects during STW - STW duration could reach 10-100ms in worst cases - Write barrier flush is 100-1000x cheaper than STW rescanning
Buffer Flushing: wbBufFlush¶
When a write barrier buffer fills, it must be flushed to make room. This operation runs in systemstack mode:
Why systemstack?
The flush operation manipulates GC state and must not be preempted. If preemption occurred mid-flush: - GC worker might process partially flushed buffer - Some objects would be missed, causing incorrect collection
Memory Visibility: Publication Barrier¶
One subtle but critical aspect is ensuring allocation visibility to the GC. When code allocates a new object and stores pointers in it, the GC must not see the object in a partially initialized state.
Why this is necessary:
Without the barrier, CPU/compiler reordering could cause: 1. Object allocated (black by default in Go) 2. GC sees object, decides it's live (already black) 3. Application initializes obj.field 4. But GC already scanned it before initialization!
The publication barrier ensures initialization happens-before GC visibility.
Advanced Topics¶
Mutator Assist vs Write Barrier¶
It's important to distinguish two distinct mechanisms:
| Aspect | Write Barrier | Mutator Assist |
|---|---|---|
| Trigger | Every pointer assignment | When allocation outpaces marking |
| Purpose | Track pointer modifications | Force marking work |
| Location | At store operation | At malloc operation |
| Cost | Always (when GC active) | Only when in "debt" |
Write barriers are always on during GC. Mutator assist is conditional based on allocation rate.
Debugging Write Barriers¶
You can inspect write barrier activity in your binary:
# Find all write barrier symbols
$ go tool nm myapp | grep gcWriteBarrier
# Disassemble a specific barrier
$ go tool objdump -s "runtime\.gcWriteBarrier" myapp
# Check if barriers are compiled into a specific function
$ go tool objdump -s "github.com/user/pkg\..*Function" myapp | grep -A5 gcWriteBarrier
This is useful for: - Verifying barriers exist in performance-critical code - Debugging memory corruption (barrier missing?) - Understanding compiler optimization decisions
Summary¶
Go's write barriers enable concurrent marking by tracking pointer modifications:
- Compiler insertion: Barriers added at every pointer store
- Assembly implementation: Minimal overhead via hand-optimized code
- Per-P buffering: Batch flushing to amortize cost
- Systemstack execution: Ensure correctness during flush
- Publication barriers: Prevent GC from seeing partially initialized objects
Understanding write barriers is essential for: - Writing GC-friendly code (minimize pointer churn) - Debugging memory corruption issues - Analyzing GC-related performance bottlenecks