In Praise of Computer Architecture: A Quantitative Approach Sixth Edition
1. Fundamentals of Quantitative Design and Analysis
1.3 Defining Computer Architecture
1.5 Trends in Power and Energy in Integrated Circuits
1.8 Measuring, Reporting, and Summarizing Performance
1.9 Quantitative Principles of Computer Design
1.10 Putting It All Together: Performance, Price, and Power
1.13 Historical Perspectives and References
Case Studies and Exercises by Diana Franklin
2.2 Memory Technology and Optimizations
2.3 Ten Advanced Optimizations of Cache Performance
2.4 Virtual Memory and Virtual Machines
2.5 Cross-Cutting Issues: The Design of Memory Hierarchies
2.6 Putting It All Together: Memory Hierarchies in the ARM Cortex-A53 and Intel Core i7 6700
2.8 Concluding Remarks: Looking Ahead
2.9 Historical Perspectives and References
3. Instruction-Level Parallelism and Its Exploitation
3.1 Instruction-Level Parallelism: Concepts and Challenges
3.2 Basic Compiler Techniques for Exposing ILP
3.3 Reducing Branch Costs With Advanced Branch Prediction
3.4 Overcoming Data Hazards With Dynamic Scheduling
3.5 Dynamic Scheduling: Examples and the Algorithm
3.6 Hardware-Based Speculation
3.7 Exploiting ILP Using Multiple Issue and Static Scheduling
3.8 Exploiting ILP Using Dynamic Scheduling, Multiple Issue, and Speculation
3.9 Advanced Techniques for Instruction Delivery and Speculation
3.11 Multithreading: Exploiting Thread-Level Parallelism to Improve Uniprocessor Throughput
3.12 Putting It All Together: The Intel Core i7 6700 and ARM Cortex-A53
3.14 Concluding Remarks: What's Ahead?
3.15 Historical Perspective and References
Case Studies and Exercises by Jason D. Bakos and Robert P. Colwell
4. Data-Level Parallelism in Vector, SIMD, and GPU Architectures
4.3 SIMD Instruction Set Extensions for Multimedia
4.5 Detecting and Enhancing Loop-Level Parallelism
4.7 Putting It All Together: Embedded Versus Server GPUs and Tesla Versus Core i7
4.10 Historical Perspective and References
Case Study and Exercises by Jason D. Bakos
5.2 Centralized Shared-Memory Architectures
5.3 Performance of Symmetric Shared-Memory Multiprocessors
5.4 Distributed Shared-Memory and Directory-Based Coherence
5.5 Synchronization: The Basics
5.6 Models of Memory Consistency: An Introduction
5.8 Putting It All Together: Multicore Processors and Their Performance
5.10 The Future of Multicore Scaling
5.12 Historical Perspectives and References
Case Studies and Exercises by Amr Zaky and David A. Wood
6. Warehouse-Scale Computers to Exploit Request-Level and Data-Level Parallelism
6.2 Programming Models and Workloads for Warehouse-Scale Computers
6.3 Computer Architecture of Warehouse-Scale Computers
6.4 The Efficiency and Cost of Warehouse-Scale Computers
6.5 Cloud Computing: The Return of Utility Computing
6.7 Putting It All Together: A Google Warehouse-Scale Computer
6.10 Historical Perspectives and References
Case Studies and Exercises by Parthasarathy Ranganathan
7. Domain-Specific Architectures
7.3 Example Domain: Deep Neural Networks
7.4 Google’s Tensor Processing Unit, an Inference Data Center Accelerator
7.5 Microsoft Catapult, a Flexible Data Center Accelerator
7.6 Intel Crest, a Data Center Accelerator for Training
7.7 Pixel Visual Core, a Personal Mobile Device Image Processing Unit
7.9 Putting It All Together: CPUs Versus GPUs Versus DNN Accelerators
7.12 Historical Perspectives and References
Appendix A. Instruction Set Principles
A.2 Classifying Instruction Set Architectures
A.5 Operations in the Instruction Set
A.6 Instructions for Control Flow
A.7 Encoding an Instruction Set
A.8 Cross-Cutting Issues: The Role of Compilers
A.9 Putting It All Together: The RISC-V Architecture
Appendix B. Review of Memory Hierarchy
B.3 Six Basic Cache Optimizations
B.5 Protection and Examples of Virtual Memory
B.8 Historical Perspective and References
Appendix C. Pipelining: Basic and Intermediate Concepts
C.2 The Major Hurdle of Pipelining—Pipeline Hazards
C.3 How Is Pipelining Implemented?
C.4 What Makes Pipelining Hard to Implement?
C.5 Extending the RISC V Integer Pipeline to Handle Multicycle Operations
C.6 Putting It All Together: The MIPS R4000 Pipeline
C.10 Historical Perspective and References
Updated Exercises by Diana Franklin
D.2 Advanced Topics in Disk Storage
D.3 Definition and Examples of Real Faults and Failures
D.4 I/O Performance, Reliability Measures, and Benchmarks
D.7 Designing and Evaluating an I/O System—The Internet Archive Cluster
D.8 Putting It All Together: NetApp FAS6000 Filer
D.11 Historical Perspective and References
Case Studies with Exercises by Andrea C. Arpaci-Dusseau and Remzi H. Arpaci-Dusseau
E.2 Signal Processing and Embedded Applications: The Digital Signal Processor
E.5 Case Study: The Emotion Engine of the Sony PlayStation 2
E.6 Case Study: Sanyo VPC-SX500 Digital Camera
E.7 Case Study: Inside a Cell Phone
Appendix F. Interconnection Networks
F.2 Interconnecting Two Devices
F.3 Connecting More than Two Devices
F.5 Network Routing, Arbitration, and Switching
F.7 Practical Issues for Commercial Interconnection Networks
F.8 Examples of Interconnection Networks
F.10 Crosscutting Issues for Interconnection Networks
F.13 Historical Perspective and References
Appendix G. Vector Processors in More Depth
G.2 Vector Performance in More Depth
G.3 Vector Memory Systems in More Depth
G.4 Enhancing Vector Performance
G.5 Effectiveness of Compiler Vectorization
G.6 Putting It All Together: Performance of Vector Processors
G.7 A Modern Vector Supercomputer: The Cray X1
G.9 Historical Perspective and References
Appendix H. Hardware and Software for VLIW and EPIC
H.1 Introduction: Exploiting Instruction-Level Parallelism Statically
H.2 Detecting and Enhancing Loop-Level Parallelism
H.3 Scheduling and Structuring Code for Parallelism
H.4 Hardware Support for Exposing Parallelism: Predicated Instructions
H.5 Hardware Support for Compiler Speculation
H.6 The Intel IA-64 Architecture and Itanium Processor
Appendix I. Large-Scale Multiprocessors and Scientific Applications
I.2 Interprocessor Communication: The Critical Performance Issue
I.3 Characteristics of Scientific Applications
I.4 Synchronization: Scaling Up
I.5 Performance of Scientific Applications on Shared-Memory Multiprocessors
I.6 Performance Measurement of Parallel Processors with Scientific Applications
I.7 Implementing Cache Coherence
I.8 The Custom Cluster Approach: Blue Gene/L
Appendix J. Computer Arithmetic
J.2 Basic Techniques of Integer Arithmetic
J.4 Floating-Point Multiplication
J.7 More on Floating-Point Arithmetic
J.8 Speeding Up Integer Addition
J.9 Speeding Up Integer Multiplication and Division
J.12 Historical Perspective and References
Appendix K. Survey of Instruction Set Architectures
K.2 A Survey of RISC Architectures for Desktop, Server, and Embedded Computers
K.5 The IBM 360/370 Architecture for Mainframe Computers
K.6 Historical Perspective and References
Appendix L. Advanced Concepts on Address Translation
Appendix M. Historical Perspectives and References
M.2 The Early Development of Computers (Chapter 1)
M.3 The Development of Memory Hierarchy and Protection (Chapter 2 and Appendix B)
M.4 The Evolution of Instruction Sets (Appendices A, J, and K)
M.7 The History of Multiprocessors and Parallel Processing (Chapter 5 and Appendices F, G, and I)
M.8 The Development of Clusters (Chapter 6)
M.9 Historical Perspectives and References
M.10 The History of Magnetic Storage, RAID, and I/O Buses (Appendix D)