Heterogeneous Computing with OpenCL by Mistry, Perhaad -- Read -- Imperial Library of Trantor

Index

Cover image Title page Table of Contents Copyright Foreword to the Revised OpenCL 1.2 Edition Foreword to the First Edition Preface

Our Heterogeneous World OpenCL This Text

Acknowledgments About the Authors Chapter 1. Introduction to Parallel Programming

Introduction OpenCL The Goals of This Book Thinking Parallel Concurrency and Parallel Programming Models Structure Reference Further Reading and Relevant Websites

Chapter 2. Introduction to OpenCL

Introduction Platform and Devices The Execution Environment Memory Model Writing Kernels Full Source Code Example for Vector Addition Vector Addition with C++ Wrapper Summary Reference

Chapter 3. OpenCL Device Architectures

Introduction Hardware trade-offs The architectural design space Summary References

Chapter 4. Basic OpenCL Examples

Introduction Example Applications Compiling OpenCL Host Applications Summary

Chapter 5. Understanding OpenCL’s Concurrency and Execution Model

Introduction Kernels, Work-Items, Workgroups, and the Execution Domain OpenCL Synchronization: Kernels, Fences, and Barriers Queuing and Global Synchronization The Host-Side Memory Model The Device-Side Memory Model Summary

Chapter 6. Dissecting a CPU/GPU OpenCL Implementation

Introduction OpenCL on an AMD Bulldozer CPU OpenCL on the AMD Radeon HD7970 GPU Memory Performance Considerations in OpenCL Summary References

Chapter 7. Data Management

Memory management Data transfer in a discrete environment Data placement in a shared-memory environment Example application—work group reduction References

Chapter 8. OpenCL Case Study: Convolution

Introduction Convolution Kernel Conclusions Code Listings Reference

Chapter 9. OpenCL Case Study: Histogram

Introduction Choosing the Number of Workgroups Choosing the Optimal Workgroup Size Optimizing Global Memory Data Access Patterns Using Atomics to Perform Local Histogram Optimizing Local Memory Access Local Histogram Reduction The Global Reduction Full Kernel Code Performance and Summary

Chapter 10. OpenCL Case Study: Mixed Particle Simulation

Introduction Overview of the Computation GPU Implementation CPU Implementation Load Balancing Performance and Summary Kernel for Uniform Grid Creation Kernels for Simulation

Chapter 11. OpenCL Extensions

Introduction Overview of Extension Mechanism Device Fission Double Precision References

Chapter 12. Foreign Lands: Plugging OpenCL In

Introduction Beyond C and C++ Haskell OpenCL Summary References

Chapter 13. OpenCL Profiling and Debugging

Introduction Profiling with events AMD Accelerated Parallel Processing Profiler AMD Accelerated Parallel Processing KernelAnalyzer Walking through the AMD APP Profiler Debugging OpenCL Applications Overview of gDEBugger AMD Printf Extension Conclusion

Chapter 14. Performance Optimization of an Image Analysis Application

Introduction Description of the algorithm Migrating multithreaded CPU implementation to OpenCL Performance optimization Power and performance analysis Conclusion References

Index