Allocation and GC¶

Memory allocation is the raison d'être of garbage collection—without allocation, there's nothing to collect. But allocation and GC are deeply intertwined in Go's runtime. Every allocation potentially triggers GC, updates GC statistics, and may force the allocating goroutine to assist with marking.

This article explores the tight coupling between allocation and GC in Go's runtime.

The Allocator-GC Feedback Loop¶

Go's allocator is not just a passive memory dispenser—it actively participates in GC:

Trigger detection: Checks if heap size warrants starting a new GC cycle
Pacer updates: Reports allocation rate and heap usage to GC controller
Mark assist: Forces allocating goroutines to perform marking work when GC falls behind

This creates a feedback loop where allocation influences GC, and GC influences allocation cost.

Triggering GC: When Allocation Starts a Cycle¶

Trigger Conditions¶

The allocator checks whether to start a new GC cycle at strategic points:

GC Trigger Check

if shouldhelpgc() {
    if t := (gcTrigger{kind: gcTriggerHeap}); t.test() {
        gcStart(t)
    }
}

When does shouldhelpgc() return true?

Allocation Type	Check Frequency
Tiny alloc (≤ 16 bytes)	Only when cache miss → refill from mcentral
Small alloc (17B - 32KB)	Only when cache miss → refill from mcentral
Large alloc (> 32KB)	Every allocation

Small objects cached in per-P cache don't check because the overhead would dominate the allocation cost. Large objects always check because allocating multiple megabytes without checking could cause runaway heap growth.

Trigger Point Calculation¶

The gcTrigger.test() call checks if current heap usage exceeds the trigger point calculated by the GC Pacer:

trigger_point = live_heap_last_gc + (heap_target - live_heap_last_gc) / assist_ratio

When heap usage ≥ trigger point, a new GC cycle begins immediately.

Mark Assist: Forcing Allocators to Mark¶

The Credit System¶

Go's GC implements a credit-based system for mark assist:

Credit Tracking

// Each goroutine maintains assist credit
var assistCredit int64  // Positive = surplus, Negative = debt

func mallocgc(size, ...) {
    // Deduct allocation size from credit
    assistG := deductAssistCredit(size)

    // If in debt, must work it off
    if assistG < 0 {
        gcAssistAlloc(assistG)
    }

    // Then actually allocate
    return allocate(size)
}

How it works:

Initial credit: Each goroutine starts with zero credit
Earn credit: Assist marking adds positive credit (mark more than allocated)
Spend credit: Allocations deduct from credit (allocate more than marked)
Go into debt: If credit goes negative, goroutine must assist before returning

Assist Execution¶

When a goroutine has negative credit, gcAssistAlloc forces it to perform marking work:

Mark Assist Flow

func gcAssistAlloc(assistG int64) {
    // Calculate required work based on assistRatio
    workBytes := -assistG * assistRatio

    // Perform marking (reuses gcDrain logic)
    gcDrain(gcw, gcDrainUntilMarkAssist)

    // Credit is updated during marking
    // When credit >= 0, assist completes
}

Key characteristic: Assist happens before allocation returns, creating strict backpressure:

┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│  Allocate   │───▶│  Assist     │───▶│  Return     │
│  Requested  │    │  Mark       │    │  Pointer    │
└─────────────┘    └─────────────┘    └─────────────┘
    Heap usage      Mark work       Heap grows
    doesn't grow    pays debt       after payment

This ensures heap growth is throttled when GC can't keep up.

Background Worker vs Assist¶

The GC Pacer reserves at least 25% CPU utilization for background GC workers:

Condition	Who does marking?
Allocation rate ≤ background GC capacity	Only background workers
Allocation rate > background GC capacity	Background workers + assists

Credit stealing: When background workers finish early, they "steal" credit from allocating goroutines, reducing assist frequency. This minimizes tail latency for most allocations.

Publication Barrier: Safe Object Visibility¶

The Problem: Instruction Reordering¶

When an object is allocated and initialized, CPU or compiler reordering could expose the object to GC before initialization completes:

Unsafe Allocation (Without Barrier)

func makeNode(data *Data) *Node {
    node := new(Node)        // Step 1: Allocate

    // publicationBarrier() would go here

    node.data = data         // Step 2: Initialize
    node.next = nil          // Step 3: Initialize

    return node              // Step 4: Publish
}

// CPU might reorder: 1 → 4 → 2 → 3
// GC sees node at step 4, but fields uninitialized at step 2!

The Solution: Publication Barrier¶

Go's allocator inserts a publication barrier to ensure memory initialization happens-before GC visibility:

Safe Allocation (With Barrier)

func mallocgc(...) unsafe.Pointer {
    // 1. Allocate memory
    result = allocate(size)

    // 2. Publication barrier (prevents reordering)
    publicationBarrier()  // CPU fence + compiler barrier

    // 3. Return to caller
    // GC won't see this object until barrier completes
    return result
}

Effect on GC:

New allocations are black by default (conservatively assumed live)
Publication barrier ensures GC never sees partially initialized objects
If GC scans before initialization completes, it simply marks object live (safe)
Initialization completes asynchronously with no race condition

Large Objects: Delayed Zeroing¶

Why Large Objects Are Special¶

When allocating large objects (>32KB), the zeroing operation becomes significant:

Small object: Zeroing cost is negligible relative to allocation overhead
Large object: Zeroing 2MB might take 500μs, during which the goroutine is unpreemptible

Go solves this with delayed zeroing—zeroing large objects incrementally.

Chunked Zeroing Strategy¶

Delayed Zeroing Implementation

func mallocgc(size, ...) unsafe.Pointer {
    largeObject := size > maxSmallSize

    if largeObject {
        // Set flag for delayed zeroing
        delayedZeroing = true
    }

    // Allocate (returns uninitialized memory for large objects)
    result := allocateSpan(size)

    if delayedZeroing {
        // Zero in 256KB chunks, allowing preemption between chunks
        chunkSize := 256 * 1024  // 256KB from benchmarking
        remaining := size

        for remaining > 0 {
            // Zero this chunk (must not preempt mid-chunk)
            memclrNoHeapPointers(result, chunkSize)

            // Allow preemption between chunks
            remaining -= chunkSize
            result = add(result, chunkSize)
        }
    }

    return result
}

Why 256KB?

This value was chosen through Go benchmarking as the sweet spot balancing: - Too small (e.g., 4KB): Frequent preemption → high overhead - Too large (e.g., 2MB): Unpreemptible period → STW delay

At 256KB, zeroing completes quickly enough to avoid long pauses, but large enough to amortize preemption overhead.

STW Impact and Prevention¶

The chunked zeroing strategy prevents a critical STW issue:

The Problem Without ChunkingThe Solution With Chunking

// Bad: Zero 2MB atomically
func zeroLargeObject(ptr, size) {
    memclrNoHeapPointers(ptr, size)  // Takes 500μs
    // During this 500μs, goroutine CANNOT be preempted
}

// If STW occurs during zeroing:
// 1. STW signals all P's to stop
// 2. Most P's stop quickly
// 3. P zeroing 2MB continues for 500μs (can't stop mid-memclr)
// 4. All other P's wait for this one P
// 5. Result: 500μs STW pause instead of <10μs

// Good: Zero 2MB in 8 chunks of 256KB
func zeroLargeObjectChunked(ptr, size) {
    for offset := 0; offset < size; offset += 256KB {
        memclrNoHeapPointers(ptr+offset, 256KB)  // 64μs per chunk
        // Check for preemption between chunks
        checkPreemption()
    }
}

// If STW occurs:
// 1. STW signals all P's to stop
// 2. Currently executing chunk finishes (≤64μs)
// 3. Goroutine preempted
// 4. STW completes with minimal delay

The principle: Move long operations out of critical sections by making them interruptible.

Summary¶

Allocation and GC are tightly coupled in Go:

Trigger detection: Allocators monitor heap size and start GC when needed
Mark assist: Allocators perform marking work when GC falls behind
Publication barriers: Ensure GC sees correctly initialized objects
Delayed zeroing: Prevent large allocations from causing STW delays

Understanding these interactions is crucial for: - Diagnosing allocation-related performance issues - Tuning GOGC for allocation-heavy workloads - Designing GC-friendly data structures (minimize pointer churn)