STW to Scan Stack¶
Go's garbage collector is concurrent—most heap scanning happens while the application runs. But one critical operation requires stopping the world: stack scanning. This article explains why stacks are special, what makes concurrent stack scanning nearly impossible, and how Go minimizes the STW cost.
The Problem: Moving Stacks¶
Unlike heap objects, goroutine stacks are dynamic and movable:
func recursive(n int) {
if n == 0 {
return
}
var buffer [1024]byte // Uses 1KB of stack space
recursive(n - 1)
}
// Initial stack: 2KB
// After 3 recursive calls: 5KB used → exceeds limit!
// Runtime action:
// 1. Allocate new stack (4KB)
// 2. Copy old stack → new stack
// 3. Update pointers to old stack → new stack
// 4. Resume goroutine
The critical issue: When a stack is copied, every pointer referencing stack variables must be updated. If the GC is scanning concurrently, it might: 1. Scan the old stack (before copy) 2. Mark objects found there 3. Stack gets copied (addresses change!) 4. GC now has stale pointers → objects incorrectly marked dead
Why Stack Scanning Is Hard¶
Challenge 1: Pointers to Stack Variables¶
Stack variables can be referenced from multiple places:
If we scan global while goroutine1's stack is being relocated, we get a dangling pointer.
Challenge 2: Register Pointers¶
Active goroutine variables might live in CPU registers, not memory:
The GC must: 1. Stop the goroutine to safely inspect registers 2. Scan registers for pointers 3. Mark those pointers as roots
Without STW, a goroutine could load a pointer into a register immediately after GC scans registers, causing the pointer to be missed.
Challenge 3: Conservative Scanning Limitations¶
Go uses conservative stack scanning for some registers and ambiguous values:
- Any value that looks like a pointer is treated as a pointer
- Prevents false negatives (missing pointers)
- But causes false positives (non-pointers treated as pointers)
This requires the goroutine to be stopped so the GC can safely interpret register contents.
The STW Solution¶
Go's approach: Scan stacks during brief, controlled STW pauses
func markRoots() {
// 1. Stop the world
stopTheWorldWithSema(stwGCMark)
// 2. For each goroutine:
for _, gp := range allgs {
// 2a. Scan stack frame by frame
for frame := gp.stack.lo; frame < gp.stack.hi; frame = nextFrame(frame) {
scanFrame(frame) // Find all pointers in frame
}
// 2b. Scan registers (if goroutine was running)
if gp.status == _Grunning {
scanRegisters(gp)
}
}
// 3. Resume the world
startTheWorldWithSema(0, stw)
}
Why STW works:
- No stack growth: Goroutines are stopped, so no stacks can move
- Stable register state: Registers are frozen, can be safely scanned
- Atomic snapshot: GC sees a consistent view of all stack roots
Minimizing STW Impact¶
While STW is necessary, Go employs several strategies to minimize its cost:
Per-Goroutine Stack Scanning¶
Stacks are scanned in parallel by GC workers:
| Approach | Complexity | STW Duration |
|---|---|---|
| Sequential | O(N) | N × avg_stack_scan_time |
| Parallel | O(N/P) | (N/P) × avg_stack_scan_time |
Where N = number of goroutines, P = number of GC workers.
Stack Bounds Caching¶
The runtime caches stack boundaries to avoid repeated calculations:
type g struct {
stack struct {
lo uintptr // Cached stack base
hi uintptr // Cached stack limit
guard uintptr // Stack guard page
}
}
This avoids expensive lookups during STW.
Deferred Stack Scanning¶
Some stacks are scanned lazily after STW completes:
func markRoots() {
// STW: Scan only running goroutines' registers
stopTheWorldWithSema(stwGCMark)
for _, gp := range allgs {
if gp.status == _Grunning {
scanRegisters(gp)
} else {
// Mark for later scanning
gp.stackScanPending = true
}
}
startTheWorldWithSema(0, stw)
// After STW: Scan sleeping goroutines' stacks concurrently
for _, gp := range allgs {
if gp.stackScanPending {
scanStack(gp) // Can run concurrently
}
}
}
Rationale: Sleeping goroutines aren't executing, so their stacks won't change. We can scan them after STW completes.
Stack Growth and GC Interaction¶
Stack growth and GC are carefully coordinated:
func morestack() {
// Called when stack overflow detected
// 1. Check if GC is active
if gcPhase == _GCmark {
// Don't grow stack during concurrent mark
// Instead: trigger GC assist to finish marking first
gcAssistAlloc(...)
return
}
// 2. Safe to grow stack (GC not active)
oldStack := g.stack
newStack := allocateStack(newSize)
// 3. Copy old → new
memmove(newStack, oldStack, oldSize)
// 4. Update pointers to old stack
adjustPointers(oldStack, newStack)
}
Why this coordination matters:
If stack growth happened during concurrent mark, the GC might: 1. Scan old stack, mark objects found there 2. Stack grows (old stack → new stack) 3. Pointers updated to new stack 4. Objects referenced from new stack never scanned → incorrectly collected
Alternative Approaches (Not Used in Go)¶
Cheney's Algorithm (Copying Collector)¶
Some GCs copy stacks as part of the collection algorithm:
- Advantage: No separate STW for stack scanning
- Disadvantage: Requires moving collector, incompatible with Go's design
Stack Barriers¶
Insert barriers at function call boundaries to track pointer modifications:
- Advantage: Concurrent stack scanning
- Disadvantage: High overhead on every function call, complex implementation
Go's designers chose brief STW over these alternatives because: - Stack scanning is fast (small, bounded size) - STW duration is predictable - Complexity is manageable
Advanced Topics¶
Register Reporting¶
When a goroutine is stopped for STW, it must report its current register state:
Which registers are scanned?
- General purpose: RAX, RBX, RCX, RDX, RSI, RDI, RBP, RSP
- Argument/return: R8-R15
- Control: RIP, RFLAGS (typically not scanned, not pointers)
Frame Pointer Elimination¶
Go's compiler can omit frame pointers (-lfpo flag), complicating stack scanning:
Impact on GC: Stack scanner must use CFI metadata (generated by compiler) to locate pointers within frames.
Summary¶
Stack scanning requires STW in Go because:
- Stacks move: Concurrent copying + scanning = race conditions
- Register pointers: Only accessible when goroutine is stopped
- Complexity: Alternatives (stack barriers) are too expensive
Go minimizes STW impact through:
- Parallel scanning: Multiple workers scan stacks simultaneously
- Lazy strategies: Defer some stack scanning until after STW
- Bounded cost: Stacks have predictable, limited size
Understanding stack scanning STW is essential for:
- Diagnosing latency spikes from GC pauses
- Tuning goroutine counts (more goroutines = longer STW)
- Optimizing stack sizes (reduce growth frequency)