Log In
Or create an account -> 
Imperial Library
  • Home
  • About
  • News
  • Upload
  • Forum
  • Help
  • Login/SignUp

Index
Title Page Copyright and Credits
Hands-On GPU Programming with Python and CUDA
Dedication About Packt
Why subscribe? Packt.com
Contributors
About the author About the reviewer Packt is searching for authors like you
Preface
Who this book is for What this book covers To get the most out of this book
Download the example code files Download the color images Conventions used
Get in touch
Reviews
Why GPU Programming?
Technical requirements Parallelization and Amdahl's Law
Using Amdahl's Law The Mandelbrot set
Profiling your code
Using the cProfile module
Summary Questions
Setting Up Your GPU Programming Environment
Technical requirements Ensuring that we have the right hardware
Checking your hardware (Linux) Checking your hardware (windows)
Installing the GPU drivers
Installing the GPU drivers (Linux) Installing the GPU drivers (Windows)
Setting up a C++ programming environment
Setting up GCC, Eclipse IDE, and graphical dependencies (Linux) Setting up Visual Studio (Windows) Installing the CUDA Toolkit
Installing the CUDA Toolkit (Linux) Installing the CUDA Toolkit (Windows)
Setting up our Python environment for GPU programming
Installing PyCUDA (Linux) Creating an environment launch script (Windows) Installing PyCUDA (Windows) Testing PyCUDA
Summary Questions
Getting Started with PyCUDA
Technical requirements Querying your GPU
Querying your GPU with PyCUDA
Using PyCUDA's gpuarray class
Transferring data to and from the GPU with gpuarray Basic pointwise arithmetic operations with gpuarray
A speed test
Using PyCUDA's ElementWiseKernel for performing pointwise computations
Mandelbrot revisited A brief foray into functional programming Parallel scan and reduction kernel basics
Summary Questions
Kernels, Threads, Blocks, and Grids
Technical requirements Kernels
The PyCUDA SourceModule function
Threads, blocks, and grids
Conway's game of life
Thread synchronization and intercommunication
Using the __syncthreads() device function Using shared memory
The parallel prefix algorithm
The naive parallel prefix algorithm Inclusive versus exclusive prefix A work-efficient parallel prefix algorithm
Work-efficient parallel prefix (up-sweep phase) Work-efficient parallel prefix (down-sweep phase)
Work-efficient parallel prefix — implementation
Summary Questions
Streams, Events, Contexts, and Concurrency
Technical requirements CUDA device synchronization
Using the PyCUDA stream class Concurrent Conway's game of life using CUDA streams
Events
Events and streams
Contexts
Synchronizing the current context Manual context creation Host-side multiprocessing and multithreading Multiple contexts for host-side concurrency
Summary Questions
Debugging and Profiling Your CUDA Code
Technical requirements Using printf from within CUDA kernels
Using printf for debugging
Filling in the gaps with CUDA-C Using the Nsight IDE for CUDA-C development and debugging
Using Nsight with Visual Studio in Windows Using Nsight with Eclipse in Linux Using Nsight to understand the warp lockstep property in CUDA
Using the NVIDIA nvprof profiler and Visual Profiler Summary Questions
Using the CUDA Libraries with Scikit-CUDA
Technical requirements Installing Scikit-CUDA Basic linear algebra with cuBLAS
Level-1 AXPY with cuBLAS Other level-1 cuBLAS functions Level-2 GEMV in cuBLAS Level-3 GEMM in cuBLAS for measuring GPU performance
Fast Fourier transforms with cuFFT
A simple 1D FFT Using an FFT for convolution Using cuFFT for 2D convolution
Using cuSolver from Scikit-CUDA
Singular value decomposition (SVD) Using SVD for Principal Component Analysis (PCA)
Summary Questions
The CUDA Device Function Libraries and Thrust
Technical requirements The cuRAND device function library
Estimating π with Monte Carlo
The CUDA Math API
A brief review of definite integration Computing definite integrals with the Monte Carlo method Writing some test cases
The CUDA Thrust library
Using functors in Thrust
Summary Questions
Implementation of a Deep Neural Network
Technical requirements Artificial neurons and neural networks
Implementing a dense layer of artificial neurons
Implementation of the softmax layer Implementation of Cross-Entropy loss Implementation of a sequential network
Implementation of inference methods Gradient descent Conditioning and normalizing data
The Iris dataset Summary Questions
Working with Compiled GPU Code
Launching compiled code with Ctypes
The Mandelbrot set revisited (again)
Compiling the code and interfacing with Ctypes
Compiling and launching pure PTX code Writing wrappers for the CUDA Driver API
Using the CUDA Driver API
Summary Questions
Performance Optimization in CUDA
Dynamic parallelism
Quicksort with dynamic parallelism
Vectorized data types and memory access Thread-safe atomic operations Warp shuffling Inline PTX assembly Performance-optimized array sum Summary Questions
Where to Go from Here
Furthering your knowledge of CUDA and GPGPU programming
Multi-GPU systems Cluster computing and MPI OpenCL and PyOpenCL
Graphics
OpenGL DirectX 12 Vulkan
Machine learning and computer vision
The basics cuDNN Tensorflow and Keras Chainer OpenCV
Blockchain technology Summary Questions
Assessment
Chapter 1, Why GPU Programming? Chapter 2, Setting Up Your GPU Programming Environment Chapter 3, Getting Started with PyCUDA Chapter 4, Kernels, Threads, Blocks, and Grids Chapter 5, Streams, Events, Contexts, and Concurrency Chapter 6, Debugging and Profiling Your CUDA Code Chapter 7, Using the CUDA Libraries with Scikit-CUDA Chapter 8, The CUDA Device Function Libraries and Thrust Chapter 9, Implementation of a Deep Neural Network Chapter 10, Working with Compiled GPU Code Chapter 11, Performance Optimization in CUDA Chapter 12, Where to Go from Here
Other Books You May Enjoy
Leave a review - let other readers know what you think
  • ← Prev
  • Back
  • Next →
  • ← Prev
  • Back
  • Next →

Chief Librarian: Las Zenow <zenow@riseup.net>
Fork the source code from gitlab
.

This is a mirror of the Tor onion service:
http://kx5thpx2olielkihfyo4jgjqfb7zx7wxr3sd4xzt26ochei4m6f7tayd.onion