Log In
Or create an account -> 
Imperial Library
  • Home
  • About
  • News
  • Upload
  • Forum
  • Help
  • Login/SignUp

Index
Cover image Title page Table of Contents Copyright Preface
Target Audience How to Use the Book Online Supplements
Acknowledgements Dedication Chapter 1. Introduction
1.1 Heterogeneous Parallel Computing 1.2 Architecture of a Modern GPU 1.3 Why More Speed or Parallelism? 1.4 Speeding Up Real Applications 1.5 Parallel Programming Languages and Models 1.6 Overarching Goals 1.7 Organization of the Book References
Chapter 2. History of GPU Computing
2.1 Evolution of Graphics Pipelines 2.2 GPGPU: An Intermediate Step 2.3 GPU Computing References and Further Reading
Chapter 3. Introduction to Data Parallelism and CUDA C
3.1 Data Parallelism 3.2 CUDA Program Structure 3.3 A Vector Addition Kernel 3.4 Device Global Memory and Data Transfer 3.5 Kernel Functions and Threading 3.6 Summary 3.7 Exercises References
Chapter 4. Data-Parallel Execution Model
4.1 Cuda Thread Organization 4.2 Mapping Threads to Multidimensional Data 4.3 Matrix-Matrix Multiplication—A More Complex Kernel 4.4 Synchronization and Transparent Scalability 4.5 Assigning Resources to Blocks 4.6 Querying Device Properties 4.7 Thread Scheduling and Latency Tolerance 4.8 Summary 4.9 Exercises
Chapter 5. CUDA Memories
5.1 Importance of Memory Access Efficiency 5.2 CUDA Device Memory Types 5.3 A Strategy for Reducing Global Memory Traffic 5.4 A Tiled Matrix–Matrix Multiplication Kernel 5.5 Memory as a Limiting Factor to Parallelism 5.6 Summary 5.7 Exercises
Chapter 6. Performance Considerations
6.1 Warps and Thread Execution 6.2 Global Memory Bandwidth 6.3 Dynamic Partitioning of Execution Resources 6.4 Instruction Mix and Thread Granularity 6.5 Summary 6.6 Exercises References
Chapter 7. Floating-Point Considerations
7.1 Floating-Point Format 7.2 Representable Numbers 7.3 Special Bit Patterns and Precision in IEEE Format 7.4 Arithmetic Accuracy and Rounding 7.5 Algorithm Considerations 7.6 Numerical Stability 7.7 Summary 7.8 Exercises References
Chapter 8. Parallel Patterns: Convolution: With an Introduction to Constant Memory and Caches
8.1 Background 8.2 1D Parallel Convolution—A Basic Algorithm 8.3 Constant Memory and Caching 8.4 Tiled 1D Convolution with Halo Elements 8.5 A Simpler Tiled 1D Convolution—General Caching 8.6 Summary 8.7 Exercises
Chapter 9. Parallel Patterns: Prefix Sum: An Introduction to Work Efficiency in Parallel Algorithms
9.1 Background 9.2 A Simple Parallel Scan 9.3 Work Efficiency Considerations 9.4 A Work-Efficient Parallel Scan 9.5 Parallel Scan for Arbitrary-Length Inputs 9.6 Summary 9.7 Exercises Reference
Chapter 10. Parallel Patterns: Sparse Matrix–Vector Multiplication: An Introduction to Compaction and Regularization in Parallel Algorithms
10.1 Background 10.2 Parallel SpMV Using CSR 10.3 Padding and Transposition 10.4 Using Hybrid to Control Padding 10.5 Sorting and Partitioning for Regularization 10.6 Summary 10.7 Exercises References
Chapter 11. Application Case Study: Advanced MRI Reconstruction
11.1 Application Background 11.2 Iterative Reconstruction 11.3 Computing FHD 11.4 Final Evaluation 11.5 Exercises References
Chapter 12. Application Case Study: Molecular Visualization and Analysis
12.1 Application Background 12.2 A Simple Kernel Implementation 12.3 Thread Granularity Adjustment 12.4 Memory Coalescing 12.5 Summary 12.6 Exercises References
Chapter 13. Parallel Programming and Computational Thinking
13.1 Goals of Parallel Computing 13.2 Problem Decomposition 13.3 Algorithm Selection 13.4 Computational Thinking 13.5 Summary 13.6 Exercises References
Chapter 14. An Introduction to OpenCLTM
14.1 Background 14.2 Data Parallelism Model 14.3 Device Architecture 14.4 Kernel Functions 14.5 Device Management and Kernel Launch 14.6 Electrostatic Potential Map in OpenCL 14.7 Summary 14.8 Exercises References
Chapter 15. Parallel Programming with OpenACC
15.1 OpenACC Versus CUDA C 15.2 Execution Model 15.3 Memory Model 15.4 Basic OpenACC Programs 15.5 Future Directions of OpenACC 15.6 Exercises
Chapter 16. Thrust: A Productivity-Oriented Library for CUDA
16.1 Background 16.2 Motivation 16.3 Basic Thrust Features 16.4 Generic Programming 16.5 Benefits of Abstraction 16.6 Programmer Productivity 16.7 Best Practices 16.8 Exercises References
Chapter 17. CUDA FORTRAN
17.1 CUDA FORTRAN and CUDA C Differences 17.2 A First CUDA FORTRAN Program 17.3 Multidimensional Array in CUDA FORTRAN 17.4 Overloading Host/Device Routines With Generic Interfaces 17.5 Calling CUDA C Via Iso_C_Binding 17.6 Kernel Loop Directives and Reduction Operations 17.7 Dynamic Shared Memory 17.8 Asynchronous Data Transfers 17.9 Compilation and Profiling 17.10 Calling Thrust from CUDA FORTRAN 17.11 Exercises
Chapter 18. An Introduction to C++ AMP
18.1 Core C++ AMP Features 18.2 Details of the C++ AMP Execution Model 18.3 Managing Accelerators 18.4 Tiled Execution 18.5 C++ AMP Graphics Features 18.6 Summary 18.7 Exercises
Chapter 19. Programming a Heterogeneous Computing Cluster
19.1 Background 19.2 A Running Example 19.3 MPI Basics 19.4 MPI Point-to-Point Communication Types 19.5 Overlapping Computation and Communication 19.6 MPI Collective Communication 19.7 Summary 19.8 Exercises Reference
Chapter 20. CUDA Dynamic Parallelism
20.1 Background 20.2 Dynamic Parallelism Overview 20.3 Important Details 20.4 Memory Visibility 20.5 A Simple Example 20.6 Runtime Limitations 20.7 A More Complex Example 20.8 Summary Reference
Chapter 21. Conclusion and Future Outlook
21.1 Goals Revisited 21.2 Memory Model Evolution 21.3 Kernel Execution Control Evolution 21.4 Core Performance 21.5 Programming Environment 21.6 Future Outlook References
Appendix A. Matrix Multiplication Host-Only Version Source Code
Appendix Outline A.1 matrixmul.cu A.2 matrixmul_gold.cpp A.3 matrixmul.h A.4 assist.h A.5 Expected Output
Appendix B. GPU Compute Capabilities
Appendix Outline B.1 GPU Compute Capability Tables B.2 Memory Coalescing Variations
Index
  • ← Prev
  • Back
  • Next →
  • ← Prev
  • Back
  • Next →

Chief Librarian: Las Zenow <zenow@riseup.net>
Fork the source code from gitlab
.

This is a mirror of the Tor onion service:
http://kx5thpx2olielkihfyo4jgjqfb7zx7wxr3sd4xzt26ochei4m6f7tayd.onion