Heterogeneous Computing with OpenCL 2.0 by Kaeli, David R. -- Read -- Imperial Library of Trantor

Index

Cover image Title page Table of Contents Copyright List of Figures List of Tables Foreword Acknowledgments Chapter 1: Introduction

Abstract 1.1 Introduction to Heterogeneous Computing 1.2 The Goals of This Book 1.3 Thinking Parallel 1.4 Concurrency and Parallel Programming Models 1.5 Threads and Shared Memory 1.6 Message-Passing Communication 1.7 Different Grains of Parallelism 1.8 Heterogeneous Computing with OpenCL 1.9 Book Structure

Chapter 2: Device architectures

Abstract 2.1 Introduction 2.2 Hardware Trade-offs 2.3 The Architectural Design Space 2.4 Summary

Chapter 3: Introduction to OpenCL

Abstract 3.1 Introduction 3.2 The OpenCL Platform Model 3.3 The OpenCL Execution Model 3.4 Kernels and the OpenCL Programming Model 3.5 OpenCL Memory Model 3.6 The OpenCL Runtime with an Example 3.7 Vector Addition Using an OpenCL C++ Wrapper 3.8 OpenCL for CUDA Programmers 3.9 Summary

Chapter 4: Examples

Abstract 4.1 OpenCL Examples 4.2 Histogram 4.3 Image Rotation 4.4 Image Convolution 4.5 Producer-Consumer 4.6 Utility Functions 4.7 Summary

Chapter 5: OpenCL runtime and concurrency model

Abstract 5.1 Commands and the Queuing Model 5.2 Multiple Command-Queues 5.3 The Kernel Execution Domain: Work-Items, Work-Groups, and NDRanges 5.4 Native and Built-In Kernels 5.5 Device-Side Queuing 5.6 Summary

Chapter 6: OpenCL host-side memory model

Abstract 6.1 Memory Objects 6.2 Memory Management 6.3 Shared Virtual Memory 6.4 Summary

Chapter 7: OpenCL device-side memory model

Abstract 7.1 Synchronization and Communication 7.2 Global Memory 7.3 Constant Memory 7.4 Local Memory 7.5 Private Memory 7.6 Generic Address Space 7.7 Memory Ordering 7.8 Summary

Chapter 8: Dissecting OpenCL on a heterogeneous system

Abstract 8.1 OpenCL on an AMD FX-8350 CPU 8.2 OpenCL on the AMD Radeon R9 290X GPU 8.3 Memory Performance Considerations in OpenCL 8.4 Summary

Chapter 9: Case study: Image clustering

Abstract 9.1 Introduction 9.2 The Feature Histogram on the CPU 9.3 OpenCL Implementation 9.4 Performance Analysis 9.5 Conclusion

Chapter 10: OpenCL profiling and debugging

Abstract 10.1 Introduction 10.2 Profiling OpenCL Code Using Events 10.3 AMD CodeXL 10.4 Profiling Using CodeXL 10.5 Analyzing Kernels Using CodeXL 10.6 Debugging OpenCL Kernels Using CodeXL 10.7 Debugging Using printf 10.8 Summary

Abstract 11.1 Introduction 11.2 A Brief Introduction to C++ AMP 11.3 OpenCL 2.0 as a Compiler Target 11.4 Mapping Key C++ AMP Constructs to OpenCL 11.5 C++ AMP Compilation Flow 11.6 Compiled C++ AMP Code 11.7 How Shared Virtual Memory in OpenCL 2.0 Fits in 11.8 Compiler Support for Tiling in C++AMP 11.9 Address Space Deduction 11.10 Data Movement Optimization 11.11 Binomial Options: A Full Example 11.12 Preliminary Results 11.13 Conclusion

Chapter 12: WebCL: Enabling OpenCL acceleration of Web applications

Abstract 12.1 Introduction 12.2 Programming with WebCL 12.3 Synchronization 12.4 Interoperability with WebGL 12.5 Example Application 12.6 Security Enhancement 12.7 WebCL on the Server 12.8 Status and Future of WebCL Works Cited

Chapter 13: Foreign lands: Plugging OpenCL in

Abstract 13.1 Introduction 13.2 Beyond C and C+ + 13.3 Haskell OpenCL 13.4 Summary

Index