Apurv Saxena's Blog: CUDA

Showing posts with label CUDA. Show all posts

Tuesday, June 26, 2012

Summed Area Table

Definition

Summed Area Table is both an algorithm and data structure used in reference with the concept of Integral images. SAT is a name used to refer to both the method and the result of conversion of an image to an integral image.

SAT – The Algorithm

Summed Area Table (or Integral Image) is an algorithm applied on a 2-dimensional array of elements. It’s a simple, single pass algorithm to obtain the integral image values from the given pixel values of image.

SAT – The Data Structure

SAT also refers to the table of values generated after applying the conversion to Integral Image. This table of values is then used as input for improving the speed of more complicated operations.

Procedure
The algorithm takes as input, a table of order nxn and returns a table of order (n+1)x(n+1). The fundamental operation done here is to apply the following formula;

I(x,y) = i(x,y) + I(x-1,y) + I(x,y-1) - I(x-1,y-1)
Where,
i(x,y) = Element of image array i[x][y]
I(x,y) = Element of integral image array I[x][y]

The procedure can be better explained by the following code snippet

Monday, June 25, 2012

How I Installed CUDA on my PC

Platform: Windows 7 (32 bit)

CUDA Hardware: None yet, will use simulator till then.

1 Prologue

Before you start, get a clear idea about CUDA and its features from here.

2 Installation

2.1 Software setups

Basic step is to download the latest SDK and to choose an appropriate Toolkit according to your hardware’s compute capability. Download links are given here.

If you don’t have the hardware yet, you will need to use the emulator to compile and run programs (covered here). The emulator was however deprecated in the 3.x updates. Hence you need to download CUDA Toolkit 2.3 from here.

2.2 Installation Steps

Thankfully for windows, no post-installation configuration is required. Just run the setups and install-away.

3 Choose your language: C\C++

3.1 C++

Visual Studio will be used for C++ development for convenience. Express versions can be downloaded and registered for no cost. Or check with your countries IEEE MSDNAA alliance website if you are a member.

(If other, better IDE’s now have compatibility with CUDA; please notify me in the comments)

Downloads:

Visual Studio2008

CUDA VS WIZARD (Win 32) 2.00 (or latest version)

3.2 C

C is best used on Linux or a native Linux environment. For windows users, the limited capability offered by the command line is satisfactory in this context. No separate setup is needed as the environment variables are already in place and CUDA’s compiler; nvcc can be invoked from the command line directly.

3.3 Java

I am, as of now, not committed to the idea of using Java for CUDA programs. But for reference, please go to this site.

After these steps have been completed, you are now ready to compile and execute CUDA programs.

CUDA Emulator

The CUDA emulator is a software that duplicates (or emulates) the functions of a computer system with a CUDA-enabled card in a computer system with no such hardware, so that the emulated behavior closely resembles the behavior of the real system. This software package is mainly aimed to empower developers and students who do not have access to Nvidia GPU's.

Initially, the CUDA Toolkit came with an emulator "built into" the CUDA compilor; "nvcc". Later, from versions after 3.0, the emulator was dropped. The last version that supports the emulator is v2.3, and can be downloaded from the CUDA Toolkit archives here.

But thankfully, several third parties have contributed to produce several emulation options that will be listed in this space shortly.

What is CUDA?

The article serves as a prologue for beginners. Please note that this is a primer not a tutorial, tutorials will follow.

CUDA (Compute Unified Device Architecture) is a parallel processing architecture that gives developers the ability to process their applications on CUDA-enabled processors. Basically it provides “us” the ability to process parallely, not just the CPU. The CPU and GPU are treated as separate devices with their own address space and memory. Actual processing is delegated to the GPU via a costly memory transfer between the CPU and GPU. After the job is finished, the result is transferred back to CPU for output to user.

One of the main implications of CUDA is that algorithms can now be split and processed on multiple processing units (called CUDA cores) to achieve excellent performance. This feature promises to add some viability to otherwise redundant or unfeasibly slow algorithms.

CUDA cores are processing units with their own memory as well as access to a shared global memory of the GPU. Each of these “cores” is a powerful processor in itself and can execute many threads collectively. Threads are the smallest execution unit of a program and are created\coded to suit the algorithm being processed.

In terms of software, parallel processed code is written using extensions to C\C++\Java, the favorite among developers being “extended C” due to its simplicity. Other languages can\will support CUDA via separately written libraries (for example: jCUDA for Java).

In terms of hardware, CUDA needs to be run using CUDA-enabled GPU’s. These devices come with hundreds of CUDA cores, a fundamental requirement to run code written for CUDA. NVIDIA provides a list of all CUDA-enabled GPU’s here.

For a detailed theoretical explanation, you may now move on to Wikipedia (here) and then to NVIDIA Documentation provided with the SDK.