Computer Architecture, A Quantitative Approach by Hennessy, John L. -- Read -- Imperial Library of Trantor

Index

Cover image Title page Table of Contents Inside Front Cover In Praise of Computer Architecture: A Quantitative Approach Sixth Edition Copyright Dedication Foreword Preface

Why We Wrote This Book This Edition Topic Selection and Organization An Overview of the Content Navigating the Text Chapter Structure Case Studies With Exercises Supplemental Materials Helping Improve This Book Concluding Remarks

Acknowledgments 1. Fundamentals of Quantitative Design and Analysis

Abstract 1.1 Introduction 1.2 Classes of Computers 1.3 Defining Computer Architecture 1.4 Trends in Technology 1.5 Trends in Power and Energy in Integrated Circuits 1.6 Trends in Cost 1.7 Dependability 1.8 Measuring, Reporting, and Summarizing Performance 1.9 Quantitative Principles of Computer Design 1.10 Putting It All Together: Performance, Price, and Power 1.11 Fallacies and Pitfalls 1.12 Concluding Remarks 1.13 Historical Perspectives and References Case Studies and Exercises by Diana Franklin References

2. Memory Hierarchy Design

Abstract 2.1 Introduction 2.2 Memory Technology and Optimizations 2.3 Ten Advanced Optimizations of Cache Performance 2.4 Virtual Memory and Virtual Machines 2.5 Cross-Cutting Issues: The Design of Memory Hierarchies 2.6 Putting It All Together: Memory Hierarchies in the ARM Cortex-A53 and Intel Core i7 6700 2.7 Fallacies and Pitfalls 2.8 Concluding Remarks: Looking Ahead 2.9 Historical Perspectives and References Case Studies and Exercises by Norman P. Jouppi, Rajeev Balasubramonian, Naveen Muralimanohar, and Sheng Li References

3. Instruction-Level Parallelism and Its Exploitation

Abstract 3.1 Instruction-Level Parallelism: Concepts and Challenges 3.2 Basic Compiler Techniques for Exposing ILP 3.3 Reducing Branch Costs With Advanced Branch Prediction 3.4 Overcoming Data Hazards With Dynamic Scheduling 3.5 Dynamic Scheduling: Examples and the Algorithm 3.6 Hardware-Based Speculation 3.7 Exploiting ILP Using Multiple Issue and Static Scheduling 3.8 Exploiting ILP Using Dynamic Scheduling, Multiple Issue, and Speculation 3.9 Advanced Techniques for Instruction Delivery and Speculation 3.10 Cross-Cutting Issues 3.11 Multithreading: Exploiting Thread-Level Parallelism to Improve Uniprocessor Throughput 3.12 Putting It All Together: The Intel Core i7 6700 and ARM Cortex-A53 3.13 Fallacies and Pitfalls 3.14 Concluding Remarks: What's Ahead? 3.15 Historical Perspective and References Case Studies and Exercises by Jason D. Bakos and Robert P. Colwell References

4. Data-Level Parallelism in Vector, SIMD, and GPU Architectures

Abstract 4.1 Introduction 4.2 Vector Architecture 4.3 SIMD Instruction Set Extensions for Multimedia 4.4 Graphics Processing Units 4.5 Detecting and Enhancing Loop-Level Parallelism 4.6 Cross-Cutting Issues 4.7 Putting It All Together: Embedded Versus Server GPUs and Tesla Versus Core i7 4.8 Fallacies and Pitfalls 4.9 Concluding Remarks 4.10 Historical Perspective and References Case Study and Exercises by Jason D. Bakos References

5. Thread-Level Parallelism

Abstract 5.1 Introduction 5.2 Centralized Shared-Memory Architectures 5.3 Performance of Symmetric Shared-Memory Multiprocessors 5.4 Distributed Shared-Memory and Directory-Based Coherence 5.5 Synchronization: The Basics 5.6 Models of Memory Consistency: An Introduction 5.7 Cross-Cutting Issues 5.8 Putting It All Together: Multicore Processors and Their Performance 5.9 Fallacies and Pitfalls 5.10 The Future of Multicore Scaling 5.11 Concluding Remarks 5.12 Historical Perspectives and References Case Studies and Exercises by Amr Zaky and David A. Wood References

6. Warehouse-Scale Computers to Exploit Request-Level and Data-Level Parallelism

Abstract 6.1 Introduction 6.2 Programming Models and Workloads for Warehouse-Scale Computers 6.3 Computer Architecture of Warehouse-Scale Computers 6.4 The Efficiency and Cost of Warehouse-Scale Computers 6.5 Cloud Computing: The Return of Utility Computing 6.6 Cross-Cutting Issues 6.7 Putting It All Together: A Google Warehouse-Scale Computer 6.8 Fallacies and Pitfalls 6.9 Concluding Remarks 6.10 Historical Perspectives and References Case Studies and Exercises by Parthasarathy Ranganathan References

7. Domain-Specific Architectures

Abstract 7.1 Introduction 7.2 Guidelines for DSAs 7.3 Example Domain: Deep Neural Networks 7.4 Google’s Tensor Processing Unit, an Inference Data Center Accelerator 7.5 Microsoft Catapult, a Flexible Data Center Accelerator 7.6 Intel Crest, a Data Center Accelerator for Training 7.7 Pixel Visual Core, a Personal Mobile Device Image Processing Unit 7.8 Cross-Cutting Issues 7.9 Putting It All Together: CPUs Versus GPUs Versus DNN Accelerators 7.10 Fallacies and Pitfalls 7.11 Concluding Remarks 7.12 Historical Perspectives and References Case Studies and Exercises by Cliff Young References

Appendix A. Instruction Set Principles

Abstract A.1 Introduction A.2 Classifying Instruction Set Architectures A.3 Memory Addressing A.4 Type and Size of Operands A.5 Operations in the Instruction Set A.6 Instructions for Control Flow A.7 Encoding an Instruction Set A.8 Cross-Cutting Issues: The Role of Compilers A.9 Putting It All Together: The RISC-V Architecture A.10 Fallacies and Pitfalls References

Appendix B. Review of Memory Hierarchy

Abstract B.1 Introduction B.2 Cache Performance B.3 Six Basic Cache Optimizations B.4 Virtual Memory B.5 Protection and Examples of Virtual Memory B.6 Fallacies and Pitfalls B.7 Concluding Remarks B.8 Historical Perspective and References Exercises by Amr Zaky References

Appendix C. Pipelining: Basic and Intermediate Concepts

Abstract C.1 Introduction C.2 The Major Hurdle of Pipelining—Pipeline Hazards C.3 How Is Pipelining Implemented? C.4 What Makes Pipelining Hard to Implement? C.5 Extending the RISC V Integer Pipeline to Handle Multicycle Operations C.6 Putting It All Together: The MIPS R4000 Pipeline C.7 Cross-Cutting Issues C.8 Fallacies and Pitfalls C.9 Concluding Remarks C.10 Historical Perspective and References Updated Exercises by Diana Franklin References

Appendix D. Storage Systems

D.1 Introduction D.2 Advanced Topics in Disk Storage D.3 Definition and Examples of Real Faults and Failures D.4 I/O Performance, Reliability Measures, and Benchmarks D.5 A Little Queuing Theory D.6 Crosscutting Issues D.7 Designing and Evaluating an I/O System—The Internet Archive Cluster D.8 Putting It All Together: NetApp FAS6000 Filer D.9 Fallacies and Pitfalls D.10 Concluding Remarks D.11 Historical Perspective and References Case Studies with Exercises by Andrea C. Arpaci-Dusseau and Remzi H. Arpaci-Dusseau References

Appendix E. Embedded Systems

E.1 Introduction E.2 Signal Processing and Embedded Applications: The Digital Signal Processor E.3 Embedded Benchmarks E.4 Embedded Multiprocessors E.5 Case Study: The Emotion Engine of the Sony PlayStation 2 E.6 Case Study: Sanyo VPC-SX500 Digital Camera E.7 Case Study: Inside a Cell Phone E.8 Concluding Remarks References

Appendix F. Interconnection Networks

F.1 Introduction F.2 Interconnecting Two Devices F.3 Connecting More than Two Devices F.4 Network Topology F.5 Network Routing, Arbitration, and Switching F.6 Switch Microarchitecture F.7 Practical Issues for Commercial Interconnection Networks F.8 Examples of Interconnection Networks F.9 Internetworking F.10 Crosscutting Issues for Interconnection Networks F.11 Fallacies and Pitfalls F.12 Concluding Remarks F.13 Historical Perspective and References Exercises References

Appendix G. Vector Processors in More Depth

G.1 Introduction G.2 Vector Performance in More Depth G.3 Vector Memory Systems in More Depth G.4 Enhancing Vector Performance G.5 Effectiveness of Compiler Vectorization G.6 Putting It All Together: Performance of Vector Processors G.7 A Modern Vector Supercomputer: The Cray X1 G.8 Concluding Remarks G.9 Historical Perspective and References Exercises References

Appendix H. Hardware and Software for VLIW and EPIC

H.1 Introduction: Exploiting Instruction-Level Parallelism Statically H.2 Detecting and Enhancing Loop-Level Parallelism H.3 Scheduling and Structuring Code for Parallelism H.4 Hardware Support for Exposing Parallelism: Predicated Instructions H.5 Hardware Support for Compiler Speculation H.6 The Intel IA-64 Architecture and Itanium Processor H.7 Concluding Remarks Reference

Appendix I. Large-Scale Multiprocessors and Scientific Applications

I.1 Introduction I.2 Interprocessor Communication: The Critical Performance Issue I.3 Characteristics of Scientific Applications I.4 Synchronization: Scaling Up I.5 Performance of Scientific Applications on Shared-Memory Multiprocessors I.6 Performance Measurement of Parallel Processors with Scientific Applications I.7 Implementing Cache Coherence I.8 The Custom Cluster Approach: Blue Gene/L I.9 Concluding Remarks References

Appendix J. Computer Arithmetic

J.1 Introduction J.2 Basic Techniques of Integer Arithmetic J.3 Floating Point J.4 Floating-Point Multiplication J.5 Floating-Point Addition J.6 Division and Remainder J.7 More on Floating-Point Arithmetic J.8 Speeding Up Integer Addition J.9 Speeding Up Integer Multiplication and Division J.10 Putting It All Together J.11 Fallacies and Pitfalls J.12 Historical Perspective and References Exercises References

Appendix K. Survey of Instruction Set Architectures

K.1 Introduction K.2 A Survey of RISC Architectures for Desktop, Server, and Embedded Computers K.3 The Intel 80x86 K.4 The VAX Architecture K.5 The IBM 360/370 Architecture for Mainframe Computers K.6 Historical Perspective and References Acknowledgments

Appendix L. Advanced Concepts on Address Translation Appendix M. Historical Perspectives and References

M.1 Introduction M.2 The Early Development of Computers (Chapter 1) References M.3 The Development of Memory Hierarchy and Protection (Chapter 2 and Appendix B) References M.4 The Evolution of Instruction Sets (Appendices A, J, and K) References M.5 The Development of Pipelining and Instruction-Level Parallelism (Chapter 3 and Appendices C and H) References M.6 The Development of SIMD Supercomputers, Vector Computers, Multimedia SIMD Instruction Extensions, and Graphical Processor Units (Chapter 4) References M.7 The History of Multiprocessors and Parallel Processing (Chapter 5 and Appendices F, G, and I) References M.8 The Development of Clusters (Chapter 6) References M.9 Historical Perspectives and References References M.10 The History of Magnetic Storage, RAID, and I/O Buses (Appendix D) References

References Index Back End Sheet Inside Back Cover

← Prev
Back
Next →

← Prev
Back
Next →