Skip to content

GC Mutator

During mark phase, we encountered mutator handling: after first STW, write barriers are enabled, then second STW flushes mutator buffers and marks those objects.

Note: read barriers would severely hurt performance. Write barriers avoid the need for read barriers. TODO: how does this work? need to understand.

Note: "mutator" actually refers to user code, like assignment operations.

This section focuses on how runtime tracks mutator implementation, emphasizing how memory barriers work and how flush mutator completes. Since post-flush marking code is consistent with mark phase, we don't discuss how to mark here.

Implementation in Go 1.23.12: src/runtime/mbarrier.go

Core Techniques

Two core techniques:

  • Yuasa deletion barrier (Deletion Barrier)
  • Dijkstra insertion barrier (Insertion Barrier)

TODO: Specific algorithms to review later, not critical for this project

Compiler Insertion

Core idea: After GC opens write barrier, user mutators execute assembly code inserted at compile time. Compiler may have multiple write barrier functions as optimization artifacts.

go tool nm myapp | grep gcWriteBarrier
go tool objdump -s "runtime\.gcWriteBarrier" myapp

Example output:

$ go tool nm app  | grep gcWriteBarrier
  4943c0 t gcWriteBarrier
  6a44b40 d github.com/bytedance/sonic/internal/decoder/jitdec._F_gcWriteBarrier2
  6a43e00 d github.com/bytedance/sonic/internal/encoder/x86._F_gcWriteBarrier2
  6a3b918 d github.com/twitchyliquid64/golang-asm/obj/wasm.gcWriteBarrier
  496c80 t runtime.gcWriteBarrier1
  496ca0 t runtime.gcWriteBarrier2
  496cc0 t runtime.gcWriteBarrier3
  496ce0 t runtime.gcWriteBarrier4
  496d00 t runtime.gcWriteBarrier5
  496d20 t runtime.gcWriteBarrier6
  496d40 t runtime.gcWriteBarrier7
 496d60 t runtime.gcWriteBarrier8

Assembly Implementation

gcWriteBarrier is written in assembly, e.g., go1.23.12/src/runtime/asm_amd64.s.

It does register protection, then puts old and new pointers into current P's wbBuf, plus some additional checks.

TEXT gcWriteBarrier<>(SB),NOSPLIT,$112

Performance Characteristics

When write barrier is not open, cost is low. Real logic in register save/restore is relatively expensive (20-50ns).

Scanning objects happens asynchronously during mark phase, so it's not considered write barrier cost.

Slowest is when GC is open and wbBuf is full—gcWriteBarrier must flush pointers to GC worker queue (200-1000ns). L1 cache pollution, these cycles could execute other operations, etc.

TODO: understanding this cost, my current understanding is poor.

This is a trade-off because it avoids more expensive STW operations.

TODO

  • TODO: mutator assist, forced during malloc, not gc write barrier's concern
  • TODO: this will be frequently asked, mainly answer write barrier logic and settings vs STW

MECE Framework

TODO: Reorganize thoughts using Zettelkasten + MECE

Zettelkasten forces "refactoring": each card only covers one independent knowledge point. MECE's core contribution: dimensionalize, which dimensions to cut the problem to ensure no omissions and no overlaps. Engineering MECE template:

  • What (definition/positioning): What is it? What core pain point does it solve? (e.g., solve STW rescan problem)
  • How (core mechanism): What are key data structures? What are key algorithm flows? (e.g., wbBuf, Yuasa+Dijkstra)
  • Cost/Trade-off: What costs does it introduce? (e.g., write operations slower, cache pollution)
  • Edge Case: What happens in extreme cases? (e.g., Buf full flush, or allocation too fast triggering Assist)

Further Reading