List of Algorithms |
List of Figures |
List of Tables |
Preface |
Acknowledgements |
Authors |
1 |
Introduction |
|
1.1 |
Explicit deallocation |
|
1.2 |
Automatic dynamic memory management |
|
1.3 |
Comparing garbage collection algorithms |
|
|
Safety |
|
|
Throughput |
|
|
Completeness and promptness |
|
|
Pause time |
|
|
Space overhead |
|
|
Optimisations for specific languages |
|
|
Scalability and portability |
|
1.4 |
A performance disadvantage? |
|
1.5 |
Experimental methodology |
|
1.6 |
Terminology and notation |
|
|
The heap |
|
|
The mutator and the collector |
|
|
The mutator roots |
|
|
References, fields and addresses |
|
|
Liveness, correctness and reachability |
|
|
Pseudo-code |
|
|
The allocator |
|
|
Mutator read and write operations |
|
|
Atomic operations |
|
|
Sets, multisets, sequences and tuples |
2 |
Mark-sweep garbage collection |
|
2.1 |
The mark-sweep algorithm |
|
2.2 |
The tricolour abstraction |
|
2.3 |
Improving mark-sweep |
|
2.4 |
Bitmap marking |
|
2.5 |
Lazy sweeping |
|
2.6 |
Cache misses in the marking loop |
|
2.7 |
Issues to consider |
|
|
Mutator overhead |
|
|
Throughput |
|
|
Space usage |
|
|
To move or not to move? |
3 |
Mark-compact garbage collection |
|
3.1 |
Two-finger compaction |
|
3.2 |
The Lisp 2 algorithm |
|
3.3 |
Threaded compaction |
|
3.4 |
One-pass algorithms |
|
3.5 |
Issues to consider |
|
|
Is compaction necessary? |
|
|
Throughput costs of compaction |
|
|
Long-lived data |
|
|
Locality |
|
|
Limitations of mark-compact algorithms |
4 |
Copying garbage collection |
|
4.1 |
Semispace copying collection |
|
|
Work list implementations |
|
|
An example |
|
4.2 |
Traversal order and locality |
|
4.3 |
Issues to consider |
|
|
Allocation |
|
|
Space and locality |
|
|
Moving objects |
5 |
Reference counting |
|
5.1 |
Advantages and disadvantages of reference counting |
|
5.2 |
Improving efficiency |
|
5.3 |
Deferred reference counting |
|
5.4 |
Coalesced reference counting |
|
5.5 |
Cyclic reference counting |
|
5.6 |
Limited-field reference counting |
|
5.7 |
Issues to consider |
|
|
The environment |
|
|
Advanced solutions |
6 |
Comparing garbage collectors |
|
6.1 |
Throughput |
|
6.2 |
Pause time |
|
6.3 |
Space |
|
6.4 |
Implementation |
|
6.5 |
Adaptive systems |
|
6.6 |
A unified theory of garbage collection |
|
|
Abstract garbage collection |
|
|
Tracing garbage collection |
|
|
Reference counting garbage collection |
7 |
Allocation |
|
7.1 |
Sequential allocation |
|
7.2 |
Free-list allocation |
|
|
First-fit allocation |
|
|
Next-fit allocation |
|
|
Best-fit allocation |
|
|
Speeding free-list allocation |
|
7.3 |
Fragmentation |
|
7.4 |
Segregated-fits allocation |
|
|
Fragmentation |
|
|
Populating size classes |
|
7.5 |
Combining segregated-fits with first-, best- and next-fit |
|
7.6 |
Additional considerations |
|
|
Alignment |
|
|
Size constraints |
|
|
Boundary tags |
|
|
Heap parsability |
|
|
Locality |
|
|
Wilderness preservation |
|
|
Crossing maps |
|
7.7 |
Allocation in concurrent systems |
|
7.8 |
Issues to consider |
8 |
Partitioning the heap |
|
8.1 |
Terminology |
|
8.2 |
Why to partition |
|
|
Partitioning by mobility |
|
|
Partitioning by size |
|
|
Partitioning for space |
|
|
Partitioning by kind |
|
|
Partitioning for yield |
|
|
Partitioning for responsiveness |
|
|
Partitioning for locality |
|
|
Partitioning by thread |
|
|
Partitioning by availability |
|
|
Partitioning by mutability |
|
8.3 |
How to partition |
|
8.4 |
When to partition |
9 |
Generational garbage collection |
|
9.1 |
Example |
|
9.2 |
Measuring time |
|
9.3 |
Generational hypotheses |
|
9.4 |
Generations and heap layout |
|
9.5 |
Multiple generations |
|
9.6 |
Age recording |
|
|
En masse promotion |
|
|
Aging semispaces |
|
|
Survivor spaces and flexibility |
|
9.7 |
Adapting to program behaviour |
|
|
Appel-style garbage collection |
|
|
Feedback controlled promotion |
|
9.8 |
Inter-generational pointers |
|
|
Remembered sets |
|
|
Pointer direction |
|
9.9 |
Space management |
|
9.10 |
Older-first garbage collection |
|
9.11 |
Beltway |
|
9.12 |
Analytic support for generational collection |
|
9.13 |
Issues to consider |
|
9.14 |
Abstract generational garbage collection |
10 |
Other partitioned schemes |
|
10.1 |
Large object spaces |
|
|
The Treadmill garbage collector |
|
|
Moving objects with operating system support |
|
|
Pointer-free objects |
|
10.2 |
Topological collectors |
|
|
Mature object space garbage collection |
|
|
Connectivity-based garbage collection |
|
|
Thread-local garbage collection |
|
|
Stack allocation |
|
|
Region inferencing |
|
10.3 |
Hybrid mark-sweep, copying collectors |
|
|
Garbage-First |
|
|
Immix and others |
|
|
Copying collection in a constrained memory space |
|
10.4 |
Bookmarking garbage collection |
|
10.5 |
Ulterior reference counting |
|
10.6 |
Issues to consider |
11 |
Run-time interface |
|
11.1 |
Interface to allocation |
|
|
Speeding allocation |
|
|
Zeroing |
|
11.2 |
Finding pointers |
|
|
Conservative pointer finding |
|
|
Accurate pointer finding using tagged values |
|
|
Accurate pointer finding in objects |
|
|
Accurate pointer finding in global roots |
|
|
Accurate pointer finding in stacks and registers |
|
|
Accurate pointer finding in code |
|
|
Handling interior pointers |
|
|
Handling derived pointers |
|
11.3 |
Object tables |
|
11.4 |
References from external code |
|
11.5 |
Stack barriers |
|
11.6 |
GC-safe points and mutator suspension |
|
11.7 |
Garbage collecting code |
|
11.8 |
Read and write barriers |
|
|
Engineering |
|
|
Precision of write barriers |
|
|
Hash tables |
|
|
Sequential store buffers |
|
|
Overflow action |
|
|
Card tables |
|
|
Crossing maps |
|
|
Summarising cards |
|
|
Hardware and virtual memory techniques |
|
|
Write barrier mechanisms: in summary |
|
|
Chunked lists |
|
11.9 |
Managing address space |
|
11.10 |
Applications of virtual memory page protection |
|
|
Double mapping |
|
|
Applications of no-access pages |
|
11.11 |
Choosing heap size |
|
11.12 |
Issues to consider |
12 |
Language-specific concerns |
|
12.1 |
Finalisation |
|
|
When do finalisers run? |
|
|
Which thread runs a finaliser? |
|
|
Can finalisers run concurrently with each other? |
|
|
Can finalisers access the object that became unreachable? |
|
|
When are finalised objects reclaimed? |
|
|
What happens if there is an error in a finaliser? |
|
|
Is there any guaranteed order to finalisation? |
|
|
The finalisation race problem |
|
|
Finalisers and locks |
|
|
Finalisation in particular languages |
|
|
For further study |
|
12.2 |
Weak references |
|
|
Additional motivations |
|
|
Supporting multiple pointer strengths |
|
|
Using Phantom objects to control finalisation order |
|
|
Race in weak pointer clearing |
|
|
Notification of weak pointer clearing |
|
|
Weak pointers in other languages |
|
12.3 |
Issues to consider |
13 |
Concurrency preliminaries |
|
13.1 |
Hardware |
|
|
Processors and threads |
|
|
Interconnect |
|
|
Memory |
|
|
Caches |
|
|
Coherence |
|
|
Cache coherence performance example: spin locks |
|
13.2 |
Hardware memory consistency |
|
|
Fences and happens-before |
|
|
Consistency models |
|
13.3 |
Hardware primitives |
|
|
Compare-and-swap |
|
|
Load-linked/store-conditionally |
|
|
Atomic arithmetic primitives |
|
|
Test then test-and-set |
|
|
More powerful primitives |
|
|
Overheads of atomic primitives |
|
13.4 |
Progress guarantees |
|
|
Progress guarantees and concurrent collection |
|
13.5 |
Notation used for concurrent algorithms |
|
13.6 |
Mutual exclusion |
|
13.7 |
Work sharing and termination detection |
|
|
Rendezvous barriers |
|
13.8 |
Concurrent data structures |
|
|
Concurrent stacks |
|
|
Concurrent queue implemented with singly linked list |
|
|
Concurrent queue implemented with array |
|
|
A concurrent deque for work stealing |
|
13.9 |
Transactional memory |
|
|
What is transactional memory? |
|
|
Using transactional memory to help implement collection |
|
|
Supporting transactional memory in the presence of garbage collection |
|
13.10 |
Issues to consider |
14 |
Parallel garbage collection |
|
14.1 |
Is there sufficient work to parallelise? |
|
14.2 |
Load balancing |
|
14.3 |
Synchronisation |
|
14.4 |
Taxonomy |
|
14.5 |
Parallel marking |
|
|
Processor-centric techniques |
|
14.6 |
Parallel copying |
|
|
Processor-centric techniques |
|
|
Memory-centric techniques |
|
14.7 |
Parallel sweeping |
|
14.8 |
Parallel compaction |
|
14.9 |
Issues to consider |
|
|
Terminology |
|
|
Is parallel collection worthwhile? |
|
|
Strategies for balancing loads |
|
|
Managing tracing |
|
|
Low-level synchronisation |
|
|
Sweeping and compaction |
|
|
Termination |
15 |
Concurrent garbage collection |
|
15.1 |
Correctness of concurrent collection |
|
|
The tricolour abstraction, revisited |
|
|
The lost object problem |
|
|
The strong and weak tricolour invariants |
|
|
Precision |
|
|
Mutator colour |
|
|
Allocation colour |
|
|
Incremental update solutions |
|
|
Snapshot-at-the-beginning solutions |
|
15.2 |
Barrier techniques for concurrent collection |
|
|
Grey mutator techniques |
|
|
Black mutator techniques |
|
|
Completeness of barrier techniques |
|
|
Concurrent write barrier mechanisms |
|
|
One-level card tables |
|
|
Two-level card tables |
|
|
Reducing work |
|
15.3 |
Issues to consider |
16 |
Concurrent mark-sweep |
|
16.1 |
Initialisation |
|
16.2 |
Termination |
|
16.3 |
Allocation |
|
16.4 |
Concurrent marking and sweeping |
|
16.5 |
On-the-fly marking |
|
|
Write barriers for on-the-fly collection |
|
|
Doligez-Leroy-Gonthier |
|
|
Doligez-Leroy-Gonthier for Java |
|
|
Sliding views |
|
16.6 |
Abstract concurrent collection |
|
|
The collector wavefront |
|
|
Adding origins |
|
|
Mutator barriers |
|
|
Precision |
|
|
Instantiating collectors |
|
16.7 |
Issues to consider |
17 |
Concurrent copying & compaction |
|
17.1 |
Mostly-concurrent copying: Baker’s algorithm |
|
|
Mostly-concurrent, mostly-copying collection |
|
17.2 |
Brooks’s indirection barrier |
|
17.3 |
Self-erasing read barriers |
|
17.4 |
Replication copying |
|
17.5 |
Multi-version copying |
|
|
Extensions to avoid copy-on-write |
|
17.6 |
Sapphire |
|
|
Collector phases |
|
|
Merging phases |
|
|
Volatile fields |
|
17.7 |
Concurrent compaction |
|
|
Compressor |
|
|
Pauseless |
|
17.8 |
Issues to consider |
18 |
Concurrent reference counting |
|
18.1 |
Simple reference counting revisited |
|
18.2 |
Buffered reference counting |
|
18.3 |
Concurrent, cyclic reference counting |
|
18.4 |
Taking a snapshot of the heap |
|
18.5 |
Sliding views reference counting |
|
|
Age-oriented collection |
|
|
The algorithm |
|
|
Sliding views cycle reclamation |
|
|
Memory consistency |
|
18.6 |
Issues to consider |
19 |
Real-time garbage collection |
|
19.1 |
Real-time systems |
|
19.2 |
Scheduling real-time collection |
|
19.3 |
Work-based real-time collection |
|
|
Parallel, concurrent replication |
|
|
Uneven work and its impact on work-based scheduling |
|
19.4 |
Slack-based real-time collection |
|
|
Scheduling the collector work |
|
|
Execution overheads |
|
|
Programmer input |
|
19.5 |
Time-based real-time collection: Metronome |
|
|
Mutator utilisation |
|
|
Supporting predictability |
|
|
Analysis |
|
|
Robustness |
|
19.6 |
Combining scheduling approaches: Tax-and-Spend |
|
|
Tax-and-Spend scheduling |
|
|
Tax-and-Spend prerequisites |
|
19.7 |
Controlling fragmentation |
|
|
Incremental compaction in Metronome |
|
|
Incremental replication on uniprocessors |
|
|
Stopless: lock-free garbage collection |
|
|
Staccato: best-effort compaction with mutator wait-freedom |
|
|
Chicken: best-effort compaction with mutator wait-freedom for x86 |
|
|
Clover: guaranteed compaction with probabilistic mutator lock-freedom |
|
|
Stopless versus Chicken versus Clover |
|
|
Fragmented allocation |
|
19.8 |
Issues to consider |
Glossary |
Bibliography |
Index |