The article serves as a prologue for beginners. Please note that this is
a primer not a tutorial, tutorials will follow.
CUDA (Compute Unified Device Architecture) is a parallel processing
architecture that gives developers the ability to process their applications on
CUDA-enabled processors. Basically it provides “us” the ability to process
parallely, not just the CPU. The CPU and GPU are treated as separate devices
with their own address space and memory. Actual processing is delegated to the
GPU via a costly memory transfer between the CPU and GPU. After the job is
finished, the result is transferred back to CPU for output to user.
One of the main implications of CUDA is that algorithms can now be split
and processed on multiple processing units (called CUDA cores) to achieve
excellent performance. This feature promises to add some viability to otherwise
redundant or unfeasibly slow algorithms.
CUDA cores are processing units with their own memory as well as access
to a shared global memory of the GPU. Each of these “cores” is a powerful processor
in itself and can execute many threads collectively. Threads are the smallest
execution unit of a program and are created\coded to suit the algorithm being
processed.
In terms of software, parallel processed code is written using
extensions to C\C++\Java, the favorite among developers being “extended C” due
to its simplicity. Other languages can\will support CUDA via separately written
libraries (for example: jCUDA for Java).
In terms of hardware, CUDA needs to be run using CUDA-enabled GPU’s. These
devices come with hundreds of CUDA cores, a fundamental requirement to run code
written for CUDA. NVIDIA provides a list of all CUDA-enabled GPU’s here.
For a detailed theoretical explanation, you may now move on to Wikipedia
(here) and then to NVIDIA
Documentation provided with the SDK.
No comments:
Post a Comment