Skip to content

Allocation and GC

Memory allocation is the raison d'être of garbage collection—without allocation, there's nothing to collect. But allocation and GC are deeply intertwined in Go's runtime. Every allocation potentially triggers GC, updates GC statistics, and may force the allocating goroutine to assist with marking.

This article explores the tight coupling between allocation and GC in Go's runtime.

The Allocator-GC Feedback Loop

Go's allocator is not just a passive memory dispenser—it actively participates in GC:

  1. Trigger detection: Checks if heap size warrants starting a new GC cycle
  2. Pacer updates: Reports allocation rate and heap usage to GC controller
  3. Mark assist: Forces allocating goroutines to perform marking work when GC falls behind

This creates a feedback loop where allocation influences GC, and GC influences allocation cost.

Triggering GC: When Allocation Starts a Cycle

Trigger Conditions

The allocator checks whether to start a new GC cycle at strategic points:

if shouldhelpgc() {
    if t := (gcTrigger{kind: gcTriggerHeap}); t.test() {
        gcStart(t)
    }
}

When does shouldhelpgc() return true?

Allocation Type Check Frequency
Tiny alloc (≤ 16 bytes) Only when cache miss → refill from mcentral
Small alloc (17B - 32KB) Only when cache miss → refill from mcentral
Large alloc (> 32KB) Every allocation

Small objects cached in per-P cache don't check because the overhead would dominate the allocation cost. Large objects always check because allocating multiple megabytes without checking could cause runaway heap growth.

Trigger Point Calculation

The gcTrigger.test() call checks if current heap usage exceeds the trigger point calculated by the GC Pacer:

trigger_point = live_heap_last_gc + (heap_target - live_heap_last_gc) / assist_ratio

When heap usage ≥ trigger point, a new GC cycle begins immediately.

Mark Assist: Forcing Allocators to Mark

The Credit System

Go's GC implements a credit-based system for mark assist:

// Each goroutine maintains assist credit
var assistCredit int64  // Positive = surplus, Negative = debt

func mallocgc(size, ...) {
    // Deduct allocation size from credit
    assistG := deductAssistCredit(size)

    // If in debt, must work it off
    if assistG < 0 {
        gcAssistAlloc(assistG)
    }

    // Then actually allocate
    return allocate(size)
}

How it works:

  1. Initial credit: Each goroutine starts with zero credit
  2. Earn credit: Assist marking adds positive credit (mark more than allocated)
  3. Spend credit: Allocations deduct from credit (allocate more than marked)
  4. Go into debt: If credit goes negative, goroutine must assist before returning

Assist Execution

When a goroutine has negative credit, gcAssistAlloc forces it to perform marking work:

func gcAssistAlloc(assistG int64) {
    // Calculate required work based on assistRatio
    workBytes := -assistG * assistRatio

    // Perform marking (reuses gcDrain logic)
    gcDrain(gcw, gcDrainUntilMarkAssist)

    // Credit is updated during marking
    // When credit >= 0, assist completes
}

Key characteristic: Assist happens before allocation returns, creating strict backpressure:

┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│  Allocate   │───▶│  Assist     │───▶│  Return     │
│  Requested  │    │  Mark       │    │  Pointer    │
└─────────────┘    └─────────────┘    └─────────────┘
    Heap usage      Mark work       Heap grows
    doesn't grow    pays debt       after payment

This ensures heap growth is throttled when GC can't keep up.

Background Worker vs Assist

The GC Pacer reserves at least 25% CPU utilization for background GC workers:

Condition Who does marking?
Allocation rate ≤ background GC capacity Only background workers
Allocation rate > background GC capacity Background workers + assists

Credit stealing: When background workers finish early, they "steal" credit from allocating goroutines, reducing assist frequency. This minimizes tail latency for most allocations.

Publication Barrier: Safe Object Visibility

The Problem: Instruction Reordering

When an object is allocated and initialized, CPU or compiler reordering could expose the object to GC before initialization completes:

func makeNode(data *Data) *Node {
    node := new(Node)        // Step 1: Allocate

    // publicationBarrier() would go here

    node.data = data         // Step 2: Initialize
    node.next = nil          // Step 3: Initialize

    return node              // Step 4: Publish
}

// CPU might reorder: 1 → 4 → 2 → 3
// GC sees node at step 4, but fields uninitialized at step 2!

The Solution: Publication Barrier

Go's allocator inserts a publication barrier to ensure memory initialization happens-before GC visibility:

func mallocgc(...) unsafe.Pointer {
    // 1. Allocate memory
    result = allocate(size)

    // 2. Publication barrier (prevents reordering)
    publicationBarrier()  // CPU fence + compiler barrier

    // 3. Return to caller
    // GC won't see this object until barrier completes
    return result
}

Effect on GC:

  • New allocations are black by default (conservatively assumed live)
  • Publication barrier ensures GC never sees partially initialized objects
  • If GC scans before initialization completes, it simply marks object live (safe)
  • Initialization completes asynchronously with no race condition

Large Objects: Delayed Zeroing

Why Large Objects Are Special

When allocating large objects (>32KB), the zeroing operation becomes significant:

  • Small object: Zeroing cost is negligible relative to allocation overhead
  • Large object: Zeroing 2MB might take 500μs, during which the goroutine is unpreemptible

Go solves this with delayed zeroing—zeroing large objects incrementally.

Chunked Zeroing Strategy

func mallocgc(size, ...) unsafe.Pointer {
    largeObject := size > maxSmallSize

    if largeObject {
        // Set flag for delayed zeroing
        delayedZeroing = true
    }

    // Allocate (returns uninitialized memory for large objects)
    result := allocateSpan(size)

    if delayedZeroing {
        // Zero in 256KB chunks, allowing preemption between chunks
        chunkSize := 256 * 1024  // 256KB from benchmarking
        remaining := size

        for remaining > 0 {
            // Zero this chunk (must not preempt mid-chunk)
            memclrNoHeapPointers(result, chunkSize)

            // Allow preemption between chunks
            remaining -= chunkSize
            result = add(result, chunkSize)
        }
    }

    return result
}

Why 256KB?

This value was chosen through Go benchmarking as the sweet spot balancing: - Too small (e.g., 4KB): Frequent preemption → high overhead - Too large (e.g., 2MB): Unpreemptible period → STW delay

At 256KB, zeroing completes quickly enough to avoid long pauses, but large enough to amortize preemption overhead.

STW Impact and Prevention

The chunked zeroing strategy prevents a critical STW issue:

// Bad: Zero 2MB atomically
func zeroLargeObject(ptr, size) {
    memclrNoHeapPointers(ptr, size)  // Takes 500μs
    // During this 500μs, goroutine CANNOT be preempted
}

// If STW occurs during zeroing:
// 1. STW signals all P's to stop
// 2. Most P's stop quickly
// 3. P zeroing 2MB continues for 500μs (can't stop mid-memclr)
// 4. All other P's wait for this one P
// 5. Result: 500μs STW pause instead of <10μs
// Good: Zero 2MB in 8 chunks of 256KB
func zeroLargeObjectChunked(ptr, size) {
    for offset := 0; offset < size; offset += 256KB {
        memclrNoHeapPointers(ptr+offset, 256KB)  // 64μs per chunk
        // Check for preemption between chunks
        checkPreemption()
    }
}

// If STW occurs:
// 1. STW signals all P's to stop
// 2. Currently executing chunk finishes (≤64μs)
// 3. Goroutine preempted
// 4. STW completes with minimal delay

The principle: Move long operations out of critical sections by making them interruptible.

Summary

Allocation and GC are tightly coupled in Go:

  • Trigger detection: Allocators monitor heap size and start GC when needed
  • Mark assist: Allocators perform marking work when GC falls behind
  • Publication barriers: Ensure GC sees correctly initialized objects
  • Delayed zeroing: Prevent large allocations from causing STW delays

Understanding these interactions is crucial for: - Diagnosing allocation-related performance issues - Tuning GOGC for allocation-heavy workloads - Designing GC-friendly data structures (minimize pointer churn)

Further Reading