GC Mutator and Write Barriers¶

The term "mutator" refers to application code that modifies heap state—allocating objects, assigning pointers, and modifying data structures. In a concurrent garbage collector, mutators run simultaneously with the GC, creating a fundamental challenge: how does the GC see changes made while it's marking?

The answer is write barriers, runtime-instrumented code that tracks pointer modifications. This article explores Go's write barrier implementation in src/runtime/mbarrier.go.

Why Write Barriers Matter¶

Go's GC marks most objects concurrently while application code runs. Without write barriers, the following scenario would cause catastrophic failures:

The Lost Object Problem

// Time T1: GC scans Object A, marks it black
// A.ptr = B (B is white/unmarked)

// Time T2: Application assigns (without write barrier)
// A.ptr = C  // C is still white!

// Time T3: B becomes unreachable elsewhere, GC never scans C
// GC concludes C is white = dead

// Time T4: GC collects C, but A.ptr still points to it!
// Application crashes when dereferencing A.ptr

Write barriers prevent this by shading (marking grey) any objects involved in pointer assignments during GC.

The Design: Why Write, Not Read?¶

Go implements write barriers only, avoiding read barriers entirely. This design choice is critical:

Read barrier: Every pointer read would check GC state, adding overhead to all pointer dereferences
Write barrier: Only pointer assignments trigger extra logic

Why read barriers are unnecessary:

Go's Dijkstra-Yuasa hybrid write barrier maintains an invariant that makes read barriers redundant. Specifically, it ensures that any object referenced by a black (scanned) object is either: 1. Also black (already scanned), or 2. Grey (queued for scanning)

This invariant means reads never need to re-check object liveness—only writes need to update tracking.

Write Barrier Implementation¶

Compiler Insertion¶

The Go compiler inserts write barrier calls at every pointer store operation. This happens before code generation:

Pointer Assignment (Source Code)Compiler Output (Simplified)

type Node struct {
    next *Node
}

func (n *Node) SetNext(other *Node) {
    n.next = other  // Compiler inserts write barrier here
}

func (n *Node) SetNext(other *Node) {
    if writeBarrier.enabled {
        gcWriteBarrier(n, &n.next, other)
    }
    n.next = other
}

Multiple barrier variants exist due to compiler optimizations:

$ go tool nm myapp | grep gcWriteBarrier
  4943c0 t gcWriteBarrier
  496c80 t runtime.gcWriteBarrier1
  496ca0 t runtime.gcWriteBarrier2
  # ... more variants

Each variant is optimized for specific assignment patterns (e.g., assigning during allocation, during store operations, etc.).

Assembly Implementation¶

The write barrier is implemented in assembly for performance (example from go1.23.12/src/runtime/asm_amd64.s):

gcWriteBarrier Assembly (Simplified)

TEXT gcWriteBarrier<>(SB),NOSPLIT,$112
    // Save registers that will be clobbered
    MOVQ    AX, 0(SP)
    MOVQ    BX, 8(SP)
    # ... more register saves

    // Get current P's write buffer
    MOVQ    (TLS), AX      // Get g (goroutine)
    MOVQ    g_m(AX), AX    // g.m
    MOVQ    m_p(AX), AX    // m.p
    LEAQ    p_wbBuf(AX), BX // &p.wbBuf

    // Add old and new pointers to buffer
    MOVQ    old_ptr, CX
    MOVQ    new_ptr, DX
    # ... buffer insertion logic

    // Check if buffer full
    CMPQ    buf_pos, buf_end
    JNE     done

    // Buffer full: flush to GC worker queue
    CALL    wbBufFlush(SB)

done:
    // Restore registers
    MOVQ    0(SP), AX
    # ... more restores
    RET

Key operations:

Register preservation: Save/restore caller-saved registers (20-50ns overhead)
Buffer insertion: Add old+new pointers to per-P buffer
Flush on full: If buffer full, synchronously flush to GC work queue

Performance Characteristics¶

Write barrier cost varies dramatically based on GC state:

Scenario	Cost	Explanation
GC inactive	~5-10ns	Barrier checks a flag and quickly returns
GC active, buffer not full	~20-50ns	Register save/restore + buffer insertion
GC active, buffer full	~200-1000ns	Includes `wbBufFlush` call (cache pollution)

Why the cost is worth it:

The expensive flush case (200-1000ns) prevents far more expensive STW pauses. Without write barriers: - GC would need to rescan all modified objects during STW - STW duration could reach 10-100ms in worst cases - Write barrier flush is 100-1000x cheaper than STW rescanning

Buffer Flushing: `wbBufFlush`¶

When a write barrier buffer fills, it must be flushed to make room. This operation runs in systemstack mode:

wbBufFlush Logic

func wbBufFlush(buf *wbBuf) {
    // Must run in systemstack (no preemption)
    systemstack(func() {
        for i := 0; i < buf.len; i++ {
            old := buf.old[i]
            new := buf.new[i]

            // Shade both old and new objects
            shade(old)
            shade(new)
        }
        buf.len = 0  // Reset buffer
    })
}

Why systemstack?

The flush operation manipulates GC state and must not be preempted. If preemption occurred mid-flush: - GC worker might process partially flushed buffer - Some objects would be missed, causing incorrect collection

Memory Visibility: Publication Barrier¶

One subtle but critical aspect is ensuring allocation visibility to the GC. When code allocates a new object and stores pointers in it, the GC must not see the object in a partially initialized state.

Publication Barrier

func allocateAndPublish() *Object {
    obj := new(Object)      // Allocate

    // Publication barrier: Ensure all writes complete
    // before object becomes visible to GC
    publicationBarrier()

    obj.field = someValue   // Initialize
    return obj
}

Why this is necessary:

Without the barrier, CPU/compiler reordering could cause: 1. Object allocated (black by default in Go) 2. GC sees object, decides it's live (already black) 3. Application initializes obj.field 4. But GC already scanned it before initialization!

The publication barrier ensures initialization happens-before GC visibility.

Advanced Topics¶

Mutator Assist vs Write Barrier¶

It's important to distinguish two distinct mechanisms:

Aspect	Write Barrier	Mutator Assist
Trigger	Every pointer assignment	When allocation outpaces marking
Purpose	Track pointer modifications	Force marking work
Location	At store operation	At malloc operation
Cost	Always (when GC active)	Only when in "debt"

Write barriers are always on during GC. Mutator assist is conditional based on allocation rate.

Debugging Write Barriers¶

You can inspect write barrier activity in your binary:

# Find all write barrier symbols
$ go tool nm myapp | grep gcWriteBarrier

# Disassemble a specific barrier
$ go tool objdump -s "runtime\.gcWriteBarrier" myapp

# Check if barriers are compiled into a specific function
$ go tool objdump -s "github.com/user/pkg\..*Function" myapp | grep -A5 gcWriteBarrier

This is useful for: - Verifying barriers exist in performance-critical code - Debugging memory corruption (barrier missing?) - Understanding compiler optimization decisions

Summary¶

Go's write barriers enable concurrent marking by tracking pointer modifications:

Compiler insertion: Barriers added at every pointer store
Assembly implementation: Minimal overhead via hand-optimized code
Per-P buffering: Batch flushing to amortize cost
Systemstack execution: Ensure correctness during flush
Publication barriers: Prevent GC from seeing partially initialized objects

Understanding write barriers is essential for: - Writing GC-friendly code (minimize pointer churn) - Debugging memory corruption issues - Analyzing GC-related performance bottlenecks