GC Mutator¶
During mark phase, we encountered mutator handling: after first STW, write barriers are enabled, then second STW flushes mutator buffers and marks those objects.
Note: read barriers would severely hurt performance. Write barriers avoid the need for read barriers. TODO: how does this work? need to understand.
Note: "mutator" actually refers to user code, like assignment operations.
This section focuses on how runtime tracks mutator implementation, emphasizing how memory barriers work and how flush mutator completes. Since post-flush marking code is consistent with mark phase, we don't discuss how to mark here.
Implementation in Go 1.23.12: src/runtime/mbarrier.go
Core Techniques¶
Two core techniques:
- Yuasa deletion barrier (Deletion Barrier)
- Dijkstra insertion barrier (Insertion Barrier)
TODO: Specific algorithms to review later, not critical for this project
Compiler Insertion¶
Core idea: After GC opens write barrier, user mutators execute assembly code inserted at compile time. Compiler may have multiple write barrier functions as optimization artifacts.
Example output:
$ go tool nm app | grep gcWriteBarrier
4943c0 t gcWriteBarrier
6a44b40 d github.com/bytedance/sonic/internal/decoder/jitdec._F_gcWriteBarrier2
6a43e00 d github.com/bytedance/sonic/internal/encoder/x86._F_gcWriteBarrier2
6a3b918 d github.com/twitchyliquid64/golang-asm/obj/wasm.gcWriteBarrier
496c80 t runtime.gcWriteBarrier1
496ca0 t runtime.gcWriteBarrier2
496cc0 t runtime.gcWriteBarrier3
496ce0 t runtime.gcWriteBarrier4
496d00 t runtime.gcWriteBarrier5
496d20 t runtime.gcWriteBarrier6
496d40 t runtime.gcWriteBarrier7
496d60 t runtime.gcWriteBarrier8
Assembly Implementation¶
gcWriteBarrier is written in assembly, e.g., go1.23.12/src/runtime/asm_amd64.s.
It does register protection, then puts old and new pointers into current P's wbBuf, plus some additional checks.
Performance Characteristics¶
When write barrier is not open, cost is low. Real logic in register save/restore is relatively expensive (20-50ns).
Scanning objects happens asynchronously during mark phase, so it's not considered write barrier cost.
Slowest is when GC is open and wbBuf is full—gcWriteBarrier must flush pointers to GC worker queue (200-1000ns). L1 cache pollution, these cycles could execute other operations, etc.
TODO: understanding this cost, my current understanding is poor.
This is a trade-off because it avoids more expensive STW operations.
TODO¶
- TODO: mutator assist, forced during malloc, not gc write barrier's concern
- TODO: this will be frequently asked, mainly answer write barrier logic and settings vs STW
MECE Framework¶
TODO: Reorganize thoughts using Zettelkasten + MECE
Zettelkasten forces "refactoring": each card only covers one independent knowledge point. MECE's core contribution: dimensionalize, which dimensions to cut the problem to ensure no omissions and no overlaps. Engineering MECE template:
- What (definition/positioning): What is it? What core pain point does it solve? (e.g., solve STW rescan problem)
- How (core mechanism): What are key data structures? What are key algorithm flows? (e.g., wbBuf, Yuasa+Dijkstra)
- Cost/Trade-off: What costs does it introduce? (e.g., write operations slower, cache pollution)
- Edge Case: What happens in extreme cases? (e.g., Buf full flush, or allocation too fast triggering Assist)