Contents

List of Algorithms

List of Figures

List of Tables

Preface

Acknowledgements

Authors

1

Introduction

1.1

Explicit deallocation

1.2

Automatic dynamic memory management

1.3

Comparing garbage collection algorithms

Safety

Throughput

Completeness and promptness

Pause time

Space overhead

Optimisations for specific languages

Scalability and portability

1.4

A performance disadvantage?

1.5

Experimental methodology

1.6

Terminology and notation

The heap

The mutator and the collector

The mutator roots

References, fields and addresses

Liveness, correctness and reachability

Pseudo-code

The allocator

Mutator read and write operations

Atomic operations

Sets, multisets, sequences and tuples

2

Mark-sweep garbage collection

2.1

The mark-sweep algorithm

2.2

The tricolour abstraction

2.3

Improving mark-sweep

2.4

Bitmap marking

2.5

Lazy sweeping

2.6

Cache misses in the marking loop

2.7

Issues to consider

Mutator overhead

Throughput

Space usage

To move or not to move?

3

Mark-compact garbage collection

3.1

Two-finger compaction

3.2

The Lisp 2 algorithm

3.3

Threaded compaction

3.4

One-pass algorithms

3.5

Issues to consider

Is compaction necessary?

Throughput costs of compaction

Long-lived data

Locality

Limitations of mark-compact algorithms

4

Copying garbage collection

4.1

Semispace copying collection

Work list implementations

An example

4.2

Traversal order and locality

4.3

Issues to consider

Allocation

Space and locality

Moving objects

5

Reference counting

5.1

Advantages and disadvantages of reference counting

5.2

Improving efficiency

5.3

Deferred reference counting

5.4

Coalesced reference counting

5.5

Cyclic reference counting

5.6

Limited-field reference counting

5.7

Issues to consider

The environment

Advanced solutions

6

Comparing garbage collectors

6.1

Throughput

6.2

Pause time

6.3

Space

6.4

Implementation

6.5

Adaptive systems

6.6

A unified theory of garbage collection

Abstract garbage collection

Tracing garbage collection

Reference counting garbage collection

7

Allocation

7.1

Sequential allocation

7.2

Free-list allocation

First-fit allocation

Next-fit allocation

Best-fit allocation

Speeding free-list allocation

7.3

Fragmentation

7.4

Segregated-fits allocation

Fragmentation

Populating size classes

7.5

Combining segregated-fits with first-, best- and next-fit

7.6

Additional considerations

Alignment

Size constraints

Boundary tags

Heap parsability

Locality

Wilderness preservation

Crossing maps

7.7

Allocation in concurrent systems

7.8

Issues to consider

8

Partitioning the heap

8.1

Terminology

8.2

Why to partition

Partitioning by mobility

Partitioning by size

Partitioning for space

Partitioning by kind

Partitioning for yield

Partitioning for responsiveness

Partitioning for locality

Partitioning by thread

Partitioning by availability

Partitioning by mutability

8.3

How to partition

8.4

When to partition

9

Generational garbage collection

9.1

Example

9.2

Measuring time

9.3

Generational hypotheses

9.4

Generations and heap layout

9.5

Multiple generations

9.6

Age recording

En masse promotion

Aging semispaces

Survivor spaces and flexibility

9.7

Adapting to program behaviour

Appel-style garbage collection

Feedback controlled promotion

9.8

Inter-generational pointers

Remembered sets

Pointer direction

9.9

Space management

9.10

Older-first garbage collection

9.11

Beltway

9.12

Analytic support for generational collection

9.13

Issues to consider

9.14

Abstract generational garbage collection

10

Other partitioned schemes

10.1

Large object spaces

The Treadmill garbage collector

Moving objects with operating system support

Pointer-free objects

10.2

Topological collectors

Mature object space garbage collection

Connectivity-based garbage collection

Thread-local garbage collection

Stack allocation

Region inferencing

10.3

Hybrid mark-sweep, copying collectors

Garbage-First

Immix and others

Copying collection in a constrained memory space

10.4

Bookmarking garbage collection

10.5

Ulterior reference counting

10.6

Issues to consider

11

Run-time interface

11.1

Interface to allocation

Speeding allocation

Zeroing

11.2

Finding pointers

Conservative pointer finding

Accurate pointer finding using tagged values

Accurate pointer finding in objects

Accurate pointer finding in global roots

Accurate pointer finding in stacks and registers

Accurate pointer finding in code

Handling interior pointers

Handling derived pointers

11.3

Object tables

11.4

References from external code

11.5

Stack barriers

11.6

GC-safe points and mutator suspension

11.7

Garbage collecting code

11.8

Read and write barriers

Engineering

Precision of write barriers

Hash tables

Sequential store buffers

Overflow action

Card tables

Crossing maps

Summarising cards

Hardware and virtual memory techniques

Write barrier mechanisms: in summary

Chunked lists

11.9

Managing address space

11.10

Applications of virtual memory page protection

Double mapping

Applications of no-access pages

11.11

Choosing heap size

11.12

Issues to consider

12

Language-specific concerns

12.1

Finalisation

When do finalisers run?

Which thread runs a finaliser?

Can finalisers run concurrently with each other?

Can finalisers access the object that became unreachable?

When are finalised objects reclaimed?

What happens if there is an error in a finaliser?

Is there any guaranteed order to finalisation?

The finalisation race problem

Finalisers and locks

Finalisation in particular languages

For further study

12.2

Weak references

Additional motivations

Supporting multiple pointer strengths

Using Phantom objects to control finalisation order

Race in weak pointer clearing

Notification of weak pointer clearing

Weak pointers in other languages

12.3

Issues to consider

13

Concurrency preliminaries

13.1

Hardware

Processors and threads

Interconnect

Memory

Caches

Coherence

Cache coherence performance example: spin locks

13.2

Hardware memory consistency

Fences and happens-before

Consistency models

13.3

Hardware primitives

Compare-and-swap

Load-linked/store-conditionally

Atomic arithmetic primitives

Test then test-and-set

More powerful primitives

Overheads of atomic primitives

13.4

Progress guarantees

Progress guarantees and concurrent collection

13.5

Notation used for concurrent algorithms

13.6

Mutual exclusion

13.7

Work sharing and termination detection

Rendezvous barriers

13.8

Concurrent data structures

Concurrent stacks

Concurrent queue implemented with singly linked list

Concurrent queue implemented with array

A concurrent deque for work stealing

13.9

Transactional memory

What is transactional memory?

Using transactional memory to help implement collection

Supporting transactional memory in the presence of garbage collection

13.10

Issues to consider

14

Parallel garbage collection

14.1

Is there sufficient work to parallelise?

14.2

Load balancing

14.3

Synchronisation

14.4

Taxonomy

14.5

Parallel marking

Processor-centric techniques

14.6

Parallel copying

Processor-centric techniques

Memory-centric techniques

14.7

Parallel sweeping

14.8

Parallel compaction

14.9

Issues to consider

Terminology

Is parallel collection worthwhile?

Strategies for balancing loads

Managing tracing

Low-level synchronisation

Sweeping and compaction

Termination

15

Concurrent garbage collection

15.1

Correctness of concurrent collection

The tricolour abstraction, revisited

The lost object problem

The strong and weak tricolour invariants

Precision

Mutator colour

Allocation colour

Incremental update solutions

Snapshot-at-the-beginning solutions

15.2

Barrier techniques for concurrent collection

Grey mutator techniques

Black mutator techniques

Completeness of barrier techniques

Concurrent write barrier mechanisms

One-level card tables

Two-level card tables

Reducing work

15.3

Issues to consider

16

Concurrent mark-sweep

16.1

Initialisation

16.2

Termination

16.3

Allocation

16.4

Concurrent marking and sweeping

16.5

On-the-fly marking

Write barriers for on-the-fly collection

Doligez-Leroy-Gonthier

Doligez-Leroy-Gonthier for Java

Sliding views

16.6

Abstract concurrent collection

The collector wavefront

Adding origins

Mutator barriers

Precision

Instantiating collectors

16.7

Issues to consider

17

Concurrent copying & compaction

17.1

Mostly-concurrent copying: Baker’s algorithm

Mostly-concurrent, mostly-copying collection

17.2

Brooks’s indirection barrier

17.3

Self-erasing read barriers

17.4

Replication copying

17.5

Multi-version copying

Extensions to avoid copy-on-write

17.6

Sapphire

Collector phases

Merging phases

Volatile fields

17.7

Concurrent compaction

Compressor

Pauseless

17.8

Issues to consider

18

Concurrent reference counting

18.1

Simple reference counting revisited

18.2

Buffered reference counting

18.3

Concurrent, cyclic reference counting

18.4

Taking a snapshot of the heap

18.5

Sliding views reference counting

Age-oriented collection

The algorithm

Sliding views cycle reclamation

Memory consistency

18.6

Issues to consider

19

Real-time garbage collection

19.1

Real-time systems

19.2

Scheduling real-time collection

19.3

Work-based real-time collection

Parallel, concurrent replication

Uneven work and its impact on work-based scheduling

19.4

Slack-based real-time collection

Scheduling the collector work

Execution overheads

Programmer input

19.5

Time-based real-time collection: Metronome

Mutator utilisation

Supporting predictability

Analysis

Robustness

19.6

Combining scheduling approaches: Tax-and-Spend

Tax-and-Spend scheduling

Tax-and-Spend prerequisites

19.7

Controlling fragmentation

Incremental compaction in Metronome

Incremental replication on uniprocessors

Stopless: lock-free garbage collection

Staccato: best-effort compaction with mutator wait-freedom

Chicken: best-effort compaction with mutator wait-freedom for x86

Clover: guaranteed compaction with probabilistic mutator lock-freedom

Stopless versus Chicken versus Clover

Fragmented allocation

19.8

Issues to consider

Glossary

Bibliography

Index