Hands-On GPU Programming with Python and CUDA by Tuomanen, Brian -- Read -- Imperial Library of Trantor

Index

Title Page Copyright and Credits

Hands-On GPU Programming with Python and CUDA

Dedication About Packt

Why subscribe? Packt.com

Contributors

About the author About the reviewer Packt is searching for authors like you

Preface

Who this book is for What this book covers To get the most out of this book

Download the example code files Download the color images Conventions used

Get in touch

Reviews

Why GPU Programming?

Technical requirements Parallelization and Amdahl's Law

Using Amdahl's Law The Mandelbrot set

Profiling your code

Using the cProfile module

Summary Questions

Setting Up Your GPU Programming Environment

Technical requirements Ensuring that we have the right hardware

Checking your hardware (Linux) Checking your hardware (windows)

Installing the GPU drivers

Installing the GPU drivers (Linux) Installing the GPU drivers (Windows)

Setting up a C++ programming environment

Setting up GCC, Eclipse IDE, and graphical dependencies (Linux) Setting up Visual Studio (Windows) Installing the CUDA Toolkit

Installing the CUDA Toolkit (Linux) Installing the CUDA Toolkit (Windows)

Setting up our Python environment for GPU programming

Installing PyCUDA (Linux) Creating an environment launch script (Windows) Installing PyCUDA (Windows) Testing PyCUDA

Summary Questions

Getting Started with PyCUDA

Technical requirements Querying your GPU

Querying your GPU with PyCUDA

Using PyCUDA's gpuarray class

Transferring data to and from the GPU with gpuarray Basic pointwise arithmetic operations with gpuarray

A speed test

Using PyCUDA's ElementWiseKernel for performing pointwise computations

Mandelbrot revisited A brief foray into functional programming Parallel scan and reduction kernel basics

Summary Questions

Kernels, Threads, Blocks, and Grids

Technical requirements Kernels

The PyCUDA SourceModule function

Threads, blocks, and grids

Conway's game of life

Thread synchronization and intercommunication

Using the __syncthreads() device function Using shared memory

The parallel prefix algorithm

The naive parallel prefix algorithm Inclusive versus exclusive prefix A work-efficient parallel prefix algorithm

Work-efficient parallel prefix (up-sweep phase) Work-efficient parallel prefix (down-sweep phase)

Work-efficient parallel prefix — implementation

Summary Questions

Streams, Events, Contexts, and Concurrency

Technical requirements CUDA device synchronization

Using the PyCUDA stream class Concurrent Conway's game of life using CUDA streams

Events

Events and streams

Contexts

Synchronizing the current context Manual context creation Host-side multiprocessing and multithreading Multiple contexts for host-side concurrency

Summary Questions

Debugging and Profiling Your CUDA Code

Technical requirements Using printf from within CUDA kernels

Using printf for debugging

Filling in the gaps with CUDA-C Using the Nsight IDE for CUDA-C development and debugging

Using Nsight with Visual Studio in Windows Using Nsight with Eclipse in Linux Using Nsight to understand the warp lockstep property in CUDA

Using the NVIDIA nvprof profiler and Visual Profiler Summary Questions

Using the CUDA Libraries with Scikit-CUDA

Technical requirements Installing Scikit-CUDA Basic linear algebra with cuBLAS

Level-1 AXPY with cuBLAS Other level-1 cuBLAS functions Level-2 GEMV in cuBLAS Level-3 GEMM in cuBLAS for measuring GPU performance

Fast Fourier transforms with cuFFT

A simple 1D FFT Using an FFT for convolution Using cuFFT for 2D convolution

Using cuSolver from Scikit-CUDA

Singular value decomposition (SVD) Using SVD for Principal Component Analysis (PCA)

Summary Questions

The CUDA Device Function Libraries and Thrust

Technical requirements The cuRAND device function library

Estimating π with Monte Carlo

The CUDA Math API

A brief review of definite integration Computing definite integrals with the Monte Carlo method Writing some test cases

The CUDA Thrust library

Using functors in Thrust

Summary Questions

Implementation of a Deep Neural Network

Technical requirements Artificial neurons and neural networks

Implementing a dense layer of artificial neurons

Implementation of the softmax layer Implementation of Cross-Entropy loss Implementation of a sequential network

Implementation of inference methods Gradient descent Conditioning and normalizing data

The Iris dataset Summary Questions

Working with Compiled GPU Code

Launching compiled code with Ctypes

The Mandelbrot set revisited (again)

Compiling the code and interfacing with Ctypes

Compiling and launching pure PTX code Writing wrappers for the CUDA Driver API

Using the CUDA Driver API

Summary Questions

Performance Optimization in CUDA

Dynamic parallelism

Quicksort with dynamic parallelism

Vectorized data types and memory access Thread-safe atomic operations Warp shuffling Inline PTX assembly Performance-optimized array sum Summary Questions

Where to Go from Here

Furthering your knowledge of CUDA and GPGPU programming

Multi-GPU systems Cluster computing and MPI OpenCL and PyOpenCL

Graphics

OpenGL DirectX 12 Vulkan

Machine learning and computer vision

The basics cuDNN Tensorflow and Keras Chainer OpenCV

Blockchain technology Summary Questions

Assessment

Chapter 1, Why GPU Programming? Chapter 2, Setting Up Your GPU Programming Environment Chapter 3, Getting Started with PyCUDA Chapter 4, Kernels, Threads, Blocks, and Grids Chapter 5, Streams, Events, Contexts, and Concurrency Chapter 6, Debugging and Profiling Your CUDA Code Chapter 7, Using the CUDA Libraries with Scikit-CUDA Chapter 8, The CUDA Device Function Libraries and Thrust Chapter 9, Implementation of a Deep Neural Network Chapter 10, Working with Compiled GPU Code Chapter 11, Performance Optimization in CUDA Chapter 12, Where to Go from Here

Other Books You May Enjoy

Leave a review - let other readers know what you think

← Prev
Back
Next →

← Prev
Back
Next →