Skip to content

STW to Scan Stack

Go's garbage collector is concurrent—most heap scanning happens while the application runs. But one critical operation requires stopping the world: stack scanning. This article explains why stacks are special, what makes concurrent stack scanning nearly impossible, and how Go minimizes the STW cost.

The Problem: Moving Stacks

Unlike heap objects, goroutine stacks are dynamic and movable:

func recursive(n int) {
    if n == 0 {
        return
    }
    var buffer [1024]byte  // Uses 1KB of stack space
    recursive(n - 1)
}

// Initial stack: 2KB
// After 3 recursive calls: 5KB used → exceeds limit!

// Runtime action:
// 1. Allocate new stack (4KB)
// 2. Copy old stack → new stack
// 3. Update pointers to old stack → new stack
// 4. Resume goroutine

The critical issue: When a stack is copied, every pointer referencing stack variables must be updated. If the GC is scanning concurrently, it might: 1. Scan the old stack (before copy) 2. Mark objects found there 3. Stack gets copied (addresses change!) 4. GC now has stale pointers → objects incorrectly marked dead

Why Stack Scanning Is Hard

Challenge 1: Pointers to Stack Variables

Stack variables can be referenced from multiple places:

var global *int

func goroutine1() {
    var x int = 42
    global = &x  // Global pointer to stack variable
    // ... more code ...
}

func goroutine2() {
    // While goroutine1 is running:
    value := *global  // Access stack variable from another goroutine
}

If we scan global while goroutine1's stack is being relocated, we get a dangling pointer.

Challenge 2: Register Pointers

Active goroutine variables might live in CPU registers, not memory:

MOVQ AX, 8(SP)   ; Store pointer on stack
; ... do work ...
MOVQ BX, AX      ; Pointer now in register

The GC must: 1. Stop the goroutine to safely inspect registers 2. Scan registers for pointers 3. Mark those pointers as roots

Without STW, a goroutine could load a pointer into a register immediately after GC scans registers, causing the pointer to be missed.

Challenge 3: Conservative Scanning Limitations

Go uses conservative stack scanning for some registers and ambiguous values:

  • Any value that looks like a pointer is treated as a pointer
  • Prevents false negatives (missing pointers)
  • But causes false positives (non-pointers treated as pointers)

This requires the goroutine to be stopped so the GC can safely interpret register contents.

The STW Solution

Go's approach: Scan stacks during brief, controlled STW pauses

func markRoots() {
    // 1. Stop the world
    stopTheWorldWithSema(stwGCMark)

    // 2. For each goroutine:
    for _, gp := range allgs {
        // 2a. Scan stack frame by frame
        for frame := gp.stack.lo; frame < gp.stack.hi; frame = nextFrame(frame) {
            scanFrame(frame)  // Find all pointers in frame
        }

        // 2b. Scan registers (if goroutine was running)
        if gp.status == _Grunning {
            scanRegisters(gp)
        }
    }

    // 3. Resume the world
    startTheWorldWithSema(0, stw)
}

Why STW works:

  • No stack growth: Goroutines are stopped, so no stacks can move
  • Stable register state: Registers are frozen, can be safely scanned
  • Atomic snapshot: GC sees a consistent view of all stack roots

Minimizing STW Impact

While STW is necessary, Go employs several strategies to minimize its cost:

Per-Goroutine Stack Scanning

Stacks are scanned in parallel by GC workers:

Approach Complexity STW Duration
Sequential O(N) N × avg_stack_scan_time
Parallel O(N/P) (N/P) × avg_stack_scan_time

Where N = number of goroutines, P = number of GC workers.

Stack Bounds Caching

The runtime caches stack boundaries to avoid repeated calculations:

type g struct {
    stack struct {
        lo uintptr  // Cached stack base
        hi uintptr  // Cached stack limit
        guard uintptr  // Stack guard page
    }
}

This avoids expensive lookups during STW.

Deferred Stack Scanning

Some stacks are scanned lazily after STW completes:

func markRoots() {
    // STW: Scan only running goroutines' registers
    stopTheWorldWithSema(stwGCMark)

    for _, gp := range allgs {
        if gp.status == _Grunning {
            scanRegisters(gp)
        } else {
            // Mark for later scanning
            gp.stackScanPending = true
        }
    }

    startTheWorldWithSema(0, stw)

    // After STW: Scan sleeping goroutines' stacks concurrently
    for _, gp := range allgs {
        if gp.stackScanPending {
            scanStack(gp)  // Can run concurrently
        }
    }
}

Rationale: Sleeping goroutines aren't executing, so their stacks won't change. We can scan them after STW completes.

Stack Growth and GC Interaction

Stack growth and GC are carefully coordinated:

func morestack() {
    // Called when stack overflow detected

    // 1. Check if GC is active
    if gcPhase == _GCmark {
        // Don't grow stack during concurrent mark
        // Instead: trigger GC assist to finish marking first
        gcAssistAlloc(...)
        return
    }

    // 2. Safe to grow stack (GC not active)
    oldStack := g.stack
    newStack := allocateStack(newSize)

    // 3. Copy old → new
    memmove(newStack, oldStack, oldSize)

    // 4. Update pointers to old stack
    adjustPointers(oldStack, newStack)
}

Why this coordination matters:

If stack growth happened during concurrent mark, the GC might: 1. Scan old stack, mark objects found there 2. Stack grows (old stack → new stack) 3. Pointers updated to new stack 4. Objects referenced from new stack never scanned → incorrectly collected

Alternative Approaches (Not Used in Go)

Cheney's Algorithm (Copying Collector)

Some GCs copy stacks as part of the collection algorithm:

  • Advantage: No separate STW for stack scanning
  • Disadvantage: Requires moving collector, incompatible with Go's design

Stack Barriers

Insert barriers at function call boundaries to track pointer modifications:

  • Advantage: Concurrent stack scanning
  • Disadvantage: High overhead on every function call, complex implementation

Go's designers chose brief STW over these alternatives because: - Stack scanning is fast (small, bounded size) - STW duration is predictable - Complexity is manageable

Advanced Topics

Register Reporting

When a goroutine is stopped for STW, it must report its current register state:

func reportRegisters(gp *g) {
    // Assembly stub to capture registers
    systemstack(func() {
        // In assembly: save all registers to gp.gcregs
        // Then GC scans gp.gcregs for pointers
    })
}

Which registers are scanned?

  • General purpose: RAX, RBX, RCX, RDX, RSI, RDI, RBP, RSP
  • Argument/return: R8-R15
  • Control: RIP, RFLAGS (typically not scanned, not pointers)

Frame Pointer Elimination

Go's compiler can omit frame pointers (-lfpo flag), complicating stack scanning:

With frame pointers:                    Without frame pointers:
BP → previous frame                     BP → ??? (not used)
BP → previous frame                     Must use CFI (Call Frame Info)
BP → previous frame                     to reconstruct frame layout

Impact on GC: Stack scanner must use CFI metadata (generated by compiler) to locate pointers within frames.

Summary

Stack scanning requires STW in Go because:

  1. Stacks move: Concurrent copying + scanning = race conditions
  2. Register pointers: Only accessible when goroutine is stopped
  3. Complexity: Alternatives (stack barriers) are too expensive

Go minimizes STW impact through:

  • Parallel scanning: Multiple workers scan stacks simultaneously
  • Lazy strategies: Defer some stack scanning until after STW
  • Bounded cost: Stacks have predictable, limited size

Understanding stack scanning STW is essential for:

  • Diagnosing latency spikes from GC pauses
  • Tuning goroutine counts (more goroutines = longer STW)
  • Optimizing stack sizes (reduce growth frequency)

Further Reading