Comparing CuPy to NumPy and CUDA

Let's compare CuPy to NumPy and CUDA in terms of simplicity in parallelization. In the following table, we explore the scope of CuPy with respect to NumPy and CUDA so as to understand the scenarios when CuPy could be advantageous to both. Here are some of the differences:

CUDA

NumPy

CuPy

Based on C/C++ programming language.

Based on Python programming language.

Based on Python programming language.

Uses C/C++ combined with specialized code to accelerate computations.

Fundamental package for scientific computing with Python on conventional CPUs.

Uses NumPy syntax but can be used for GPUs.

Casting behaviors from float to integer are defined in CUDA specification.

Casting behaviors from float to integer are defined in C++ specification.

Casting behaviors from float to integer are not defined in C++ specification.

cuRAND is available in CUDA C/C++ for random value generation.

Random value generator does not support dtype option and always returns a float32 value.

It supports any type of float values because of using cuRAND in CuPy.

CUDA handles out-of-bounds integer array indexing by raising an error.

NumPy handles out-of-bounds integer array indexing by raising an error.

CuPy handles out-of-bounds indices differently by wrapping around them.

 

To achieve reduction in CuPy, you can perform reduction in a manner that is very similar to implementing an ElementwiseKernel. As per the official documentation, it is also known as ReductionKernel (as in PyCUDA and PyOpenCL) and can be implemented by defining the following four parts of the kernel code:

  1. Identity value: Used for the initial value of reduction.
  2. Mapping expression: Used for preprocessing each element for reduction.
  3. Reduction expression: An operator to reduce the multiple mapped values. Special variables, a and b are used for its operands.
  4. Post mapping expression: Used to transform the resulting reduced values. The special variable a, is used as its input. The output should be written to the output parameter.