GC Mark¶

Implementation in Go 1.23.12: src/runtime/mgcmark.go

Two STW pauses
How scanning works

STW Before GC Scan¶

Entry point is gcStart function:

STW first via stopTheWorldWithSema(stwGCSweepTerm)
finishsweep_m() ensures all sweep is complete—STW extends until heap is clean (implies STW might be long if there are issues)
Allocate bg mark go routine (has special requirements, can be ignored)
Clear sync.Pool caches via clearpools()
if mode != gcBackgroundMode TODO: what is this? runtime.GC behaves differently, need to investigate
startTheWorldWithSema(0, stw): end STW, begin scanning

GC Worker¶

Background mark starts before gc mark.

Within STW, setGCPhase(_GCmark) enables write barriers
Within STW, gcMarkRootPrepare advances queues, mainly global and stack
Within STW, gcMarkTinyAllocs marks small objects black TODO: don't understand what this is
Within STW, atomic.Store(&gcBlackenEnabled, 1) enables assist marking TODO: don't understand what this is

Since it's asynchronous, gcStart mainly sets state and starts scanning. Background goroutines scan, with two scanning sources: background marking workers and forced assists during allocation (assistRatio calculated by GC pacer).

Background scanning logic comes from gcBgMarkStartWorkers. Background scan is per-P (set by gomaxprocs), each P has its own gcBgMarkWorker execution, waiting via channel to enter mark phase.

The goroutine (G) and machine (M) are bound via node creation—preemption is disabled here TODO: don't understand why.

The goroutine enters a loop—when not working, it's parked via gopark consuming no CPU. TODO: don't know how gopark works.

When scheduler calls findRunnableGCWorker to pull it from the pool, gopark ends.

Work logic is wrapped in systemstack. TODO: why?

Then state changes so others can scan it. TODO: don't quite understand casGToWaitingForSuspendG(gp, _Grunning, waitReasonGCWorkerActive).

Three modes (in switch statement):

gcMarkWorkerDedicatedMode → gcDrainMarkWorkerDedicated
gcMarkWorkerFractionalMode → gcDrainMarkWorkerFractional
gcMarkWorkerIdleMode → gcDrainMarkWorkerIdle

Finally, nwait checks state and calls gcMarkDone—the last GC worker ends the gc mark phase.

gcDrain: markroot¶

All concrete logic is in gcDrain, the core scanning logic. "Drain" means turning all grey objects black.

Color Theory¶

White: Potentially garbage (all objects start white at GC start)
Grey: Reachable but referenced objects not yet scanned (in queue)
Black: Reachable and fully scanned live objects (removed from queue)

Note: Marking doesn't tag objects as "live" or "dead"—"dead" is not an independent state, but the result of missing mark tags.

Scanning Process¶

Scanning starts from GC roots (global and stack). All objects are initially white (small allocs seem black though), mark them as live via markroot(gcw, job, flushBgCredit):

Global variables in BSS data segment
Finalizers also scanned? TODO: don't know about this
Scan goroutine stacks

markroot scans in "block" units. GC doesn't know what's in a block, but compiler generates ptr masks for GC to check which words are pointers.

Concrete scanning logic in scanblock has a shading logic greyobject, then adds to GC worker queue via gcw.putFast or gcw.put.

Note: Black and grey look the same in bitmap—they're only distinguished by whether they're in the queue. gcDrain only drains the queue, doesn't care if it's black or check if it exists.

Some logic can be ignored during scanning: - How to get all child references from current pointer - How to mark bitmap

Concrete queue removal: pop current object (grey → black transition), add all child objects to queue.

gcDrain: drain heap¶

Infinite loop until all (reachable) objects are scanned.

Loop tries to get an object (fast path, slow path), finally tries wbBufFlush to record objects accumulated in write barrier by mutators. If nothing available, exit.

Note: After flushing all mutator data, new grey objects might theoretically still exist in mutators—wbBufFlush doesn't guarantee mutator has no遗漏. Later, when all workers finish, gcMarkDone and gcMarkTermination check all mutators to ensure no objects are missed. See mutator implementation for details.

TODO: wgBufFlush in systemstack, don't know what this is

Getting the object implementation pops current object from queue, turning it from grey to black.

After getting object, scan it. Note: markroot scans in block units, while here it scans in object units.

Large objects have additional split scanning logic, mostly finding child objects, which can be ignored. Find all child pointers and add to queue.

markDone → mark termination¶

When all reachable objects are marked, state changes: gc mark → mark termination.

Only one worker can execute gcMarkDone, then initially checks if gc mark can end. Need to understand what conditions cause gcMarkDone to fail—often corresponds to a corner case and bug.

Flush all P-local caches, extract potential grey objects, jump to top and check again—if something exists, go back to work. This corresponds to earlier confusion.

If clean, start STW (time to wrap up). After STW, check again because mutators might have received new content between flushing P and STW. If nothing, end; if something exists, continue. So classic case is 2 STW in mark phase, but bad luck could mean many STW.

GC mark strategy: brief STW to enable write barriers, so most mspan marking doesn't need STW. Second STW specifically handles objects recorded in write barrier buffers. This dramatically improves program availability.

Mark termination also does cleanup work: - Update stack size for GC pacer - Disable gc assist—just finished scanning, cleanup comes next, heap usage will drop, no reason to trigger assist - gcWakeAllAssists: ? - gcWakeAllStrongFromWeak: ?

Call gcController.endCycle(now, int(gomaxprocs), work.userForced) to tell GC pacer details about this scanning round.

Then gcMarkTermination updates a bunch of state: memstats, increment GC generation counter, restart the world, etc. Mark phase ends.

GC increments global heap sweepgen counter by 2, signaling next sweep phase starts. GC doesn't sweep itself—lets bgWorker or allocator clean up. This +2 operation allows allocator and bgWorker to run concurrently, one CAS operation. Benefit: lazy sweep, sweep on demand.

mark and sweep ordering¶

Emphasis: mark and sweep ordering, e.g., mark N, sweep N, mark N+1, sweep N+1.

mark N always precedes sweep N—understandable, sweep completely depends on mark's bitmaps.

sweep N strictly precedes mark N+1—wait happens in N+1 GC cycle via finishsweep_m() at start of gcStart. New GC cycle waits for previous sweep to complete. In most cases, sweep is fast and GC interval is long. In extreme cases, since mark would overwrite state, must wait for previous GC sweep to complete.

Q&A¶

Q1: STW Boundary and Cost¶

"You mentioned gcStart begins with STW (Sweep Term), gcMarkDone ends with STW (Mark Term). Since Go claims 'concurrent GC', why can't we eliminate these STW pauses? If my service logs show 50ms STW, analyze what might cause this in Sweep Term and Mark Term phases."

STW1: write barrier, STW2: flush mutator and finish mark. This ensures Go can mark most spans without STW—only objects recorded inside mutators need sweeping under STW.

If we eliminate the 2 STW, only possible way is STW and sweep all, which is worse because it takes longer and application is totally unavailable during that timespan.

50ms STW: - mark start → last sweep not done, means allocation surge - mark termination STW (flush mutator buff) → frequent pointer/assign operations. Application is write-heavy. Check QPS, allocation, GOMAXPROCS, OS noise (CPU throttling).

TODO: go tool trace

Q2: Mutator Assist Trigger¶

"We saw atomic.Store(&gcBlackenEnabled, 1) enables assist marking in gcStart. My Go service normally uses 10% CPU, but during traffic spike CPU hits 100% and latency jumps from 20ms to 200ms. Trace shows many goroutines in gcAssistAlloc. How does GC decide a G has 'debt' to repay (to assist marking)? How is this debt calculated? What happens without this mechanism?"

Q3: Write Barrier Essence¶

"You mentioned wbBufFlush processes write barrier buffers. Go uses Hybrid Write Barrier. Why do we need 'write barriers'? If I disable write barriers during Mark (for performance), what specifically goes wrong? Give concrete pointer assignment example showing how objects get 'lost'."

Write barrier's core goal: complete most object scanning without STW. If disabled for performance, concurrent mark phase can't identify new modifications during scanning. Two impacts, by severity: 1) Needed object released → serious program runtime crash, or lighter UB; 2) Object not released that should be → memory leak (fixed next GC)

Q4: Scan Density and Performance¶

"You analyzed gcDrain and scanobject. Two services: Service A caches 10GB in map[int]*User (User struct has many pointers). Service B caches 10GB in map[int][]byte (image data) or large arrays. Same heap size, which has longer GC Mark phase? Why? What guidance for writing high-performance Go code?"

Q5: Termination Detection Limits¶

"You described gcMarkDone logic: Flush all P, if work remains goto top. Extreme scenario: P1 flushes and sees empty, P2 still running and creates new object in buffer via write barrier. Does P1 think GC is done? How does Go ensure distributed consensus that 'no work left'?"