Log In
Or create an account ->
Imperial Library
Home
About
News
Upload
Forum
Help
Login/SignUp
Index
Cover image
Title page
Table of Contents
Inside Front Cover
In Praise of Computer Architecture: A Quantitative Approach Sixth Edition
Copyright
Dedication
Foreword
Preface
Why We Wrote This Book
This Edition
Topic Selection and Organization
An Overview of the Content
Navigating the Text
Chapter Structure
Case Studies With Exercises
Supplemental Materials
Helping Improve This Book
Concluding Remarks
Acknowledgments
1. Fundamentals of Quantitative Design and Analysis
Abstract
1.1 Introduction
1.2 Classes of Computers
1.3 Defining Computer Architecture
1.4 Trends in Technology
1.5 Trends in Power and Energy in Integrated Circuits
1.6 Trends in Cost
1.7 Dependability
1.8 Measuring, Reporting, and Summarizing Performance
1.9 Quantitative Principles of Computer Design
1.10 Putting It All Together: Performance, Price, and Power
1.11 Fallacies and Pitfalls
1.12 Concluding Remarks
1.13 Historical Perspectives and References
Case Studies and Exercises by Diana Franklin
References
2. Memory Hierarchy Design
Abstract
2.1 Introduction
2.2 Memory Technology and Optimizations
2.3 Ten Advanced Optimizations of Cache Performance
2.4 Virtual Memory and Virtual Machines
2.5 Cross-Cutting Issues: The Design of Memory Hierarchies
2.6 Putting It All Together: Memory Hierarchies in the ARM Cortex-A53 and Intel Core i7 6700
2.7 Fallacies and Pitfalls
2.8 Concluding Remarks: Looking Ahead
2.9 Historical Perspectives and References
Case Studies and Exercises by Norman P. Jouppi, Rajeev Balasubramonian, Naveen Muralimanohar, and Sheng Li
References
3. Instruction-Level Parallelism and Its Exploitation
Abstract
3.1 Instruction-Level Parallelism: Concepts and Challenges
3.2 Basic Compiler Techniques for Exposing ILP
3.3 Reducing Branch Costs With Advanced Branch Prediction
3.4 Overcoming Data Hazards With Dynamic Scheduling
3.5 Dynamic Scheduling: Examples and the Algorithm
3.6 Hardware-Based Speculation
3.7 Exploiting ILP Using Multiple Issue and Static Scheduling
3.8 Exploiting ILP Using Dynamic Scheduling, Multiple Issue, and Speculation
3.9 Advanced Techniques for Instruction Delivery and Speculation
3.10 Cross-Cutting Issues
3.11 Multithreading: Exploiting Thread-Level Parallelism to Improve Uniprocessor Throughput
3.12 Putting It All Together: The Intel Core i7 6700 and ARM Cortex-A53
3.13 Fallacies and Pitfalls
3.14 Concluding Remarks: What's Ahead?
3.15 Historical Perspective and References
Case Studies and Exercises by Jason D. Bakos and Robert P. Colwell
References
4. Data-Level Parallelism in Vector, SIMD, and GPU Architectures
Abstract
4.1 Introduction
4.2 Vector Architecture
4.3 SIMD Instruction Set Extensions for Multimedia
4.4 Graphics Processing Units
4.5 Detecting and Enhancing Loop-Level Parallelism
4.6 Cross-Cutting Issues
4.7 Putting It All Together: Embedded Versus Server GPUs and Tesla Versus Core i7
4.8 Fallacies and Pitfalls
4.9 Concluding Remarks
4.10 Historical Perspective and References
Case Study and Exercises by Jason D. Bakos
References
5. Thread-Level Parallelism
Abstract
5.1 Introduction
5.2 Centralized Shared-Memory Architectures
5.3 Performance of Symmetric Shared-Memory Multiprocessors
5.4 Distributed Shared-Memory and Directory-Based Coherence
5.5 Synchronization: The Basics
5.6 Models of Memory Consistency: An Introduction
5.7 Cross-Cutting Issues
5.8 Putting It All Together: Multicore Processors and Their Performance
5.9 Fallacies and Pitfalls
5.10 The Future of Multicore Scaling
5.11 Concluding Remarks
5.12 Historical Perspectives and References
Case Studies and Exercises by Amr Zaky and David A. Wood
References
6. Warehouse-Scale Computers to Exploit Request-Level and Data-Level Parallelism
Abstract
6.1 Introduction
6.2 Programming Models and Workloads for Warehouse-Scale Computers
6.3 Computer Architecture of Warehouse-Scale Computers
6.4 The Efficiency and Cost of Warehouse-Scale Computers
6.5 Cloud Computing: The Return of Utility Computing
6.6 Cross-Cutting Issues
6.7 Putting It All Together: A Google Warehouse-Scale Computer
6.8 Fallacies and Pitfalls
6.9 Concluding Remarks
6.10 Historical Perspectives and References
Case Studies and Exercises by Parthasarathy Ranganathan
References
7. Domain-Specific Architectures
Abstract
7.1 Introduction
7.2 Guidelines for DSAs
7.3 Example Domain: Deep Neural Networks
7.4 Google’s Tensor Processing Unit, an Inference Data Center Accelerator
7.5 Microsoft Catapult, a Flexible Data Center Accelerator
7.6 Intel Crest, a Data Center Accelerator for Training
7.7 Pixel Visual Core, a Personal Mobile Device Image Processing Unit
7.8 Cross-Cutting Issues
7.9 Putting It All Together: CPUs Versus GPUs Versus DNN Accelerators
7.10 Fallacies and Pitfalls
7.11 Concluding Remarks
7.12 Historical Perspectives and References
Case Studies and Exercises by Cliff Young
References
Appendix A. Instruction Set Principles
Abstract
A.1 Introduction
A.2 Classifying Instruction Set Architectures
A.3 Memory Addressing
A.4 Type and Size of Operands
A.5 Operations in the Instruction Set
A.6 Instructions for Control Flow
A.7 Encoding an Instruction Set
A.8 Cross-Cutting Issues: The Role of Compilers
A.9 Putting It All Together: The RISC-V Architecture
A.10 Fallacies and Pitfalls
References
Appendix B. Review of Memory Hierarchy
Abstract
B.1 Introduction
B.2 Cache Performance
B.3 Six Basic Cache Optimizations
B.4 Virtual Memory
B.5 Protection and Examples of Virtual Memory
B.6 Fallacies and Pitfalls
B.7 Concluding Remarks
B.8 Historical Perspective and References
Exercises by Amr Zaky
References
Appendix C. Pipelining: Basic and Intermediate Concepts
Abstract
C.1 Introduction
C.2 The Major Hurdle of Pipelining—Pipeline Hazards
C.3 How Is Pipelining Implemented?
C.4 What Makes Pipelining Hard to Implement?
C.5 Extending the RISC V Integer Pipeline to Handle Multicycle Operations
C.6 Putting It All Together: The MIPS R4000 Pipeline
C.7 Cross-Cutting Issues
C.8 Fallacies and Pitfalls
C.9 Concluding Remarks
C.10 Historical Perspective and References
Updated Exercises by Diana Franklin
References
Appendix D. Storage Systems
D.1 Introduction
D.2 Advanced Topics in Disk Storage
D.3 Definition and Examples of Real Faults and Failures
D.4 I/O Performance, Reliability Measures, and Benchmarks
D.5 A Little Queuing Theory
D.6 Crosscutting Issues
D.7 Designing and Evaluating an I/O System—The Internet Archive Cluster
D.8 Putting It All Together: NetApp FAS6000 Filer
D.9 Fallacies and Pitfalls
D.10 Concluding Remarks
D.11 Historical Perspective and References
Case Studies with Exercises by Andrea C. Arpaci-Dusseau and Remzi H. Arpaci-Dusseau
References
Appendix E. Embedded Systems
E.1 Introduction
E.2 Signal Processing and Embedded Applications: The Digital Signal Processor
E.3 Embedded Benchmarks
E.4 Embedded Multiprocessors
E.5 Case Study: The Emotion Engine of the Sony PlayStation 2
E.6 Case Study: Sanyo VPC-SX500 Digital Camera
E.7 Case Study: Inside a Cell Phone
E.8 Concluding Remarks
References
Appendix F. Interconnection Networks
F.1 Introduction
F.2 Interconnecting Two Devices
F.3 Connecting More than Two Devices
F.4 Network Topology
F.5 Network Routing, Arbitration, and Switching
F.6 Switch Microarchitecture
F.7 Practical Issues for Commercial Interconnection Networks
F.8 Examples of Interconnection Networks
F.9 Internetworking
F.10 Crosscutting Issues for Interconnection Networks
F.11 Fallacies and Pitfalls
F.12 Concluding Remarks
F.13 Historical Perspective and References
Exercises
References
Appendix G. Vector Processors in More Depth
G.1 Introduction
G.2 Vector Performance in More Depth
G.3 Vector Memory Systems in More Depth
G.4 Enhancing Vector Performance
G.5 Effectiveness of Compiler Vectorization
G.6 Putting It All Together: Performance of Vector Processors
G.7 A Modern Vector Supercomputer: The Cray X1
G.8 Concluding Remarks
G.9 Historical Perspective and References
Exercises
References
Appendix H. Hardware and Software for VLIW and EPIC
H.1 Introduction: Exploiting Instruction-Level Parallelism Statically
H.2 Detecting and Enhancing Loop-Level Parallelism
H.3 Scheduling and Structuring Code for Parallelism
H.4 Hardware Support for Exposing Parallelism: Predicated Instructions
H.5 Hardware Support for Compiler Speculation
H.6 The Intel IA-64 Architecture and Itanium Processor
H.7 Concluding Remarks
Reference
Appendix I. Large-Scale Multiprocessors and Scientific Applications
I.1 Introduction
I.2 Interprocessor Communication: The Critical Performance Issue
I.3 Characteristics of Scientific Applications
I.4 Synchronization: Scaling Up
I.5 Performance of Scientific Applications on Shared-Memory Multiprocessors
I.6 Performance Measurement of Parallel Processors with Scientific Applications
I.7 Implementing Cache Coherence
I.8 The Custom Cluster Approach: Blue Gene/L
I.9 Concluding Remarks
References
Appendix J. Computer Arithmetic
J.1 Introduction
J.2 Basic Techniques of Integer Arithmetic
J.3 Floating Point
J.4 Floating-Point Multiplication
J.5 Floating-Point Addition
J.6 Division and Remainder
J.7 More on Floating-Point Arithmetic
J.8 Speeding Up Integer Addition
J.9 Speeding Up Integer Multiplication and Division
J.10 Putting It All Together
J.11 Fallacies and Pitfalls
J.12 Historical Perspective and References
Exercises
References
Appendix K. Survey of Instruction Set Architectures
K.1 Introduction
K.2 A Survey of RISC Architectures for Desktop, Server, and Embedded Computers
K.3 The Intel 80x86
K.4 The VAX Architecture
K.5 The IBM 360/370 Architecture for Mainframe Computers
K.6 Historical Perspective and References
Acknowledgments
Appendix L. Advanced Concepts on Address Translation
Appendix M. Historical Perspectives and References
M.1 Introduction
M.2 The Early Development of Computers (Chapter 1)
References
M.3 The Development of Memory Hierarchy and Protection (Chapter 2 and Appendix B)
References
M.4 The Evolution of Instruction Sets (Appendices A, J, and K)
References
M.5 The Development of Pipelining and Instruction-Level Parallelism (Chapter 3 and Appendices C and H)
References
M.6 The Development of SIMD Supercomputers, Vector Computers, Multimedia SIMD Instruction Extensions, and Graphical Processor Units (Chapter 4)
References
M.7 The History of Multiprocessors and Parallel Processing (Chapter 5 and Appendices F, G, and I)
References
M.8 The Development of Clusters (Chapter 6)
References
M.9 Historical Perspectives and References
References
M.10 The History of Magnetic Storage, RAID, and I/O Buses (Appendix D)
References
References
Index
Back End Sheet
Inside Back Cover
← Prev
Back
Next →
← Prev
Back
Next →