Monday, June 25, 2012

What is CUDA?


The article serves as a prologue for beginners. Please note that this is a primer not a tutorial, tutorials will follow.

CUDA (Compute Unified Device Architecture) is a parallel processing architecture that gives developers the ability to process their applications on CUDA-enabled processors. Basically it provides “us” the ability to process parallely, not just the CPU. The CPU and GPU are treated as separate devices with their own address space and memory. Actual processing is delegated to the GPU via a costly memory transfer between the CPU and GPU. After the job is finished, the result is transferred back to CPU for output to user.

One of the main implications of CUDA is that algorithms can now be split and processed on multiple processing units (called CUDA cores) to achieve excellent performance. This feature promises to add some viability to otherwise redundant or unfeasibly slow algorithms.
CUDA cores are processing units with their own memory as well as access to a shared global memory of the GPU. Each of these “cores” is a powerful processor in itself and can execute many threads collectively. Threads are the smallest execution unit of a program and are created\coded to suit the algorithm being processed.

In terms of software, parallel processed code is written using extensions to C\C++\Java, the favorite among developers being “extended C” due to its simplicity. Other languages can\will support CUDA via separately written libraries (for example: jCUDA for Java).  

In terms of hardware, CUDA needs to be run using CUDA-enabled GPU’s. These devices come with hundreds of CUDA cores, a fundamental requirement to run code written for CUDA. NVIDIA provides a list of all CUDA-enabled GPU’s here.

For a detailed theoretical explanation, you may now move on to Wikipedia (here) and then to NVIDIA Documentation provided with the SDK.

No comments:

Post a Comment