Skip to content

GC Mutator and Write Barriers

The term "mutator" refers to application code that modifies heap state—allocating objects, assigning pointers, and modifying data structures. In a concurrent garbage collector, mutators run simultaneously with the GC, creating a fundamental challenge: how does the GC see changes made while it's marking?

The answer is write barriers, runtime-instrumented code that tracks pointer modifications. This article explores Go's write barrier implementation in src/runtime/mbarrier.go.

Why Write Barriers Matter

Go's GC marks most objects concurrently while application code runs. Without write barriers, the following scenario would cause catastrophic failures:

// Time T1: GC scans Object A, marks it black
// A.ptr = B (B is white/unmarked)

// Time T2: Application assigns (without write barrier)
// A.ptr = C  // C is still white!

// Time T3: B becomes unreachable elsewhere, GC never scans C
// GC concludes C is white = dead

// Time T4: GC collects C, but A.ptr still points to it!
// Application crashes when dereferencing A.ptr

Write barriers prevent this by shading (marking grey) any objects involved in pointer assignments during GC.

The Design: Why Write, Not Read?

Go implements write barriers only, avoiding read barriers entirely. This design choice is critical:

  • Read barrier: Every pointer read would check GC state, adding overhead to all pointer dereferences
  • Write barrier: Only pointer assignments trigger extra logic

Why read barriers are unnecessary:

Go's Dijkstra-Yuasa hybrid write barrier maintains an invariant that makes read barriers redundant. Specifically, it ensures that any object referenced by a black (scanned) object is either: 1. Also black (already scanned), or 2. Grey (queued for scanning)

This invariant means reads never need to re-check object liveness—only writes need to update tracking.

Write Barrier Implementation

Compiler Insertion

The Go compiler inserts write barrier calls at every pointer store operation. This happens before code generation:

type Node struct {
    next *Node
}

func (n *Node) SetNext(other *Node) {
    n.next = other  // Compiler inserts write barrier here
}
func (n *Node) SetNext(other *Node) {
    if writeBarrier.enabled {
        gcWriteBarrier(n, &n.next, other)
    }
    n.next = other
}

Multiple barrier variants exist due to compiler optimizations:

$ go tool nm myapp | grep gcWriteBarrier
  4943c0 t gcWriteBarrier
  496c80 t runtime.gcWriteBarrier1
  496ca0 t runtime.gcWriteBarrier2
  # ... more variants

Each variant is optimized for specific assignment patterns (e.g., assigning during allocation, during store operations, etc.).

Assembly Implementation

The write barrier is implemented in assembly for performance (example from go1.23.12/src/runtime/asm_amd64.s):

TEXT gcWriteBarrier<>(SB),NOSPLIT,$112
    // Save registers that will be clobbered
    MOVQ    AX, 0(SP)
    MOVQ    BX, 8(SP)
    # ... more register saves

    // Get current P's write buffer
    MOVQ    (TLS), AX      // Get g (goroutine)
    MOVQ    g_m(AX), AX    // g.m
    MOVQ    m_p(AX), AX    // m.p
    LEAQ    p_wbBuf(AX), BX // &p.wbBuf

    // Add old and new pointers to buffer
    MOVQ    old_ptr, CX
    MOVQ    new_ptr, DX
    # ... buffer insertion logic

    // Check if buffer full
    CMPQ    buf_pos, buf_end
    JNE     done

    // Buffer full: flush to GC worker queue
    CALL    wbBufFlush(SB)

done:
    // Restore registers
    MOVQ    0(SP), AX
    # ... more restores
    RET

Key operations:

  1. Register preservation: Save/restore caller-saved registers (20-50ns overhead)
  2. Buffer insertion: Add old+new pointers to per-P buffer
  3. Flush on full: If buffer full, synchronously flush to GC work queue

Performance Characteristics

Write barrier cost varies dramatically based on GC state:

Scenario Cost Explanation
GC inactive ~5-10ns Barrier checks a flag and quickly returns
GC active, buffer not full ~20-50ns Register save/restore + buffer insertion
GC active, buffer full ~200-1000ns Includes wbBufFlush call (cache pollution)

Why the cost is worth it:

The expensive flush case (200-1000ns) prevents far more expensive STW pauses. Without write barriers: - GC would need to rescan all modified objects during STW - STW duration could reach 10-100ms in worst cases - Write barrier flush is 100-1000x cheaper than STW rescanning

Buffer Flushing: wbBufFlush

When a write barrier buffer fills, it must be flushed to make room. This operation runs in systemstack mode:

func wbBufFlush(buf *wbBuf) {
    // Must run in systemstack (no preemption)
    systemstack(func() {
        for i := 0; i < buf.len; i++ {
            old := buf.old[i]
            new := buf.new[i]

            // Shade both old and new objects
            shade(old)
            shade(new)
        }
        buf.len = 0  // Reset buffer
    })
}

Why systemstack?

The flush operation manipulates GC state and must not be preempted. If preemption occurred mid-flush: - GC worker might process partially flushed buffer - Some objects would be missed, causing incorrect collection

Memory Visibility: Publication Barrier

One subtle but critical aspect is ensuring allocation visibility to the GC. When code allocates a new object and stores pointers in it, the GC must not see the object in a partially initialized state.

func allocateAndPublish() *Object {
    obj := new(Object)      // Allocate

    // Publication barrier: Ensure all writes complete
    // before object becomes visible to GC
    publicationBarrier()

    obj.field = someValue   // Initialize
    return obj
}

Why this is necessary:

Without the barrier, CPU/compiler reordering could cause: 1. Object allocated (black by default in Go) 2. GC sees object, decides it's live (already black) 3. Application initializes obj.field 4. But GC already scanned it before initialization!

The publication barrier ensures initialization happens-before GC visibility.

Advanced Topics

Mutator Assist vs Write Barrier

It's important to distinguish two distinct mechanisms:

Aspect Write Barrier Mutator Assist
Trigger Every pointer assignment When allocation outpaces marking
Purpose Track pointer modifications Force marking work
Location At store operation At malloc operation
Cost Always (when GC active) Only when in "debt"

Write barriers are always on during GC. Mutator assist is conditional based on allocation rate.

Debugging Write Barriers

You can inspect write barrier activity in your binary:

# Find all write barrier symbols
$ go tool nm myapp | grep gcWriteBarrier

# Disassemble a specific barrier
$ go tool objdump -s "runtime\.gcWriteBarrier" myapp

# Check if barriers are compiled into a specific function
$ go tool objdump -s "github.com/user/pkg\..*Function" myapp | grep -A5 gcWriteBarrier

This is useful for: - Verifying barriers exist in performance-critical code - Debugging memory corruption (barrier missing?) - Understanding compiler optimization decisions

Summary

Go's write barriers enable concurrent marking by tracking pointer modifications:

  • Compiler insertion: Barriers added at every pointer store
  • Assembly implementation: Minimal overhead via hand-optimized code
  • Per-P buffering: Batch flushing to amortize cost
  • Systemstack execution: Ensure correctness during flush
  • Publication barriers: Prevent GC from seeing partially initialized objects

Understanding write barriers is essential for: - Writing GC-friendly code (minimize pointer churn) - Debugging memory corruption issues - Analyzing GC-related performance bottlenecks

Further Reading