Sunday, September 30, 2012

Matrix Multiplication: 3x3 and 3x1


Formula:

| a11 a12 a13 |    | b1 |    | a11*b1 + a12*b2 + a13*b3 |
| a21 a22 a23 | x | b2 | = | a21*b1 + a22*b2 + a23*b3 |
| a31 a32 a33 |    | b3 |    | a31*b1 + a32*b2 + a33*b3 |

For arrays:

| a11 a12 a13 |      | a[0][0] a[0][1] a[0][2] |
| a21 a22 a23 |  =  | a[1][0] a[1][1] a[1][2] |
| a31 a32 a33 |      | a[2][0] a[2][1] a[2][2] |

| b1 |    | b[0] |
| b2 | = | b[1] |
| b3 |    | b[2] |

| c1 |    | a11 a12 a13 |    | b1 |    | a[0][0]*b[0] + a[0][1]*b[1] + a[0][2]*b[2] |
| c2 | = | a21 a22 a23 | x | b2 | = | a[1][0]*b[0] + a[1][1]*b[1] + a[1][2]*b[2] |
| c3 |    | a31 a32 a33 |    | b3 |    | a[2][0]*b[0] + a[2][1]*b[1] + a[2][2]*b[2] |

Code:   

c[0] = a[0][0]*b[0] + a[0][1]*b[1] + a[0][2]*b[2]
c[1] = a[1][0]*b[0] + a[1][1]*b[1] + a[1][2]*b[2]
c[2] = a[2][0]*b[0] + a[2][1]*b[1] + a[2][2]*b[2]

Matrix Inverse - 3x3

Operations of matrices of 3x3 are very common in day to day applications. Because the dimensions are already known, its better to avoid using a for-loop because of the extra computation involved in finding the subscript values.
Following is a quick formula for the inverse of a 3x3 matrix (implemented as a 2D array).

Formula

The inverse of a 3x3 matrix:
| a11 a12 a13 |-1                   |   a33a22-a32a23  -(a33a12-a32a13)   a23a12-a22a13   | 
| a21 a22 a23 |    =  1/DET *  | -(a33a21-a31a23)   a33a11-a31a13  -(a23a11-a21a13) |
| a31 a32 a33 |                      |   a32a21-a31a22  -(a32a11-a31a12)   a22a11-a21a12   |

where DET is the determinant of the matrix, i.e.
DET  =  a11(a33a22 - a32a23) - a21(a33a12 - a32a13) + a31(a23a12 - a22a13)

For 2D array

Moving on from matrices to 2D arrays
| a11 a12 a13 |        | mat[0][0] mat[0][1] mat[0][2] |
| a21 a22 a23 |   =   | mat[1][0] mat[1][1] mat[1][2] |
| a31 a32 a33 |        | mat[2][0] mat[2][1] mat[2][2] |
and,
| a11 a12 a13 |-1     | inv[0][0] inv[0][1] inv[0][2] |
| a21 a22 a23 |   =   | inv[1][0] inv[1][1] inv[1][2] |
| a31 a32 a33 |        | inv[2][0] inv[2][1] inv[2][2] |

Hence for code, the assignments are:
inv[0][0] = mat[2][2] * mat[1][1] - mat[2][1] * mat[1][2]
inv[0][1] = mat[2][1] * mat[0][2] - mat[2][2] * mat[0][1]
inv[0][2] = mat[1][2] * mat[0][1] - mat[1][1] * mat[0][2]
inv[1][0] = mat[2][0] * mat[1][2] - mat[2][2] * mat[1][0]
inv[1][1] = mat[2][2] * mat[0][0] - mat[2][0] * mat[0][2]
inv[1][2] = mat[1][0] * mat[0][2] - mat[1][2] * mat[0][0]
inv[2][0] = mat[2][1] * mat[1][0] - mat[2][0] * mat[1][1]
inv[2][1] = mat[2][0] * mat[0][1] - mat[2][1] * mat[0][0]
inv[2][2] = mat[1][1] * mat[0][0] - mat[1][0] * mat[0][1] 

DET =   mat[0][0]*inv[0][0] + mat[1][0]*inv[1][0] + mat[2][0]*inv[0][2]

Saturday, August 11, 2012

Forgotten rules of C/C++: Part 1


Following is a list of important points people find very convenient to forget. This is the first article of a series

·   For bitwise operations, operand is promoted to "int" before evaluation.
unsigned char I = 0x80;
printf("%d", i<<1 nbsp="nbsp">  256

·   printf(5 + "intelligent");  => “ligent”
      Number specifies displacement to string pointer. This is the same as doing…
char *s = "intelligent";
s += 5;
printf("%s",s);

·    In a switch statement, initializations are allowed in the beginning, but they are NOT EXECUTED. The control is passed directly to an executable statement, i.e. matching case statement.
switch(1)
{
     printf("hello");                         => NOT PRINTED
     case 1:printf("case 1");break;
     case 2:printf("case 2");break;
}

·   The ++ operator means ``add one to a variable'' and DOES NOT work with constants.
int i = ++ 3 ;
printf("%d", i);                              =>  GARBAGE VALUE

·   Static arrays are constant! The base address cannot be modified. Any pointer arithmetic that attempts to do so causes a compilation error.
int arr[ 10 ] ;                     => “arr” is a constant. It is defined to be the address, &arr[ 0 ].

·   Array pointer logic for an array, “int a[10][10];”
      a+1
=> a[1]
=> &a[1][0]

·    In pointer arithmetic, addition and subtraction are valid operations, (as long as they don’t try to change the base address of the pointer) but pointer division and pointer multiplication are INVALID.

Thursday, June 28, 2012

Code Snippet - Summed Area Table

SUMMED AREA TABLE
Using the method described here, I have written the following code to calculate a summed area table. A sample image table of 5x5 dimensions has been assumed for simplicity. The code snippet and its results are given below.

CODE
// assumed height and width of the input table
#define height 6
#define width 6

// input matrix - 5x5
long matrix[ height - 1][width -1] = {{5,2,3,4,1},{1,5,4,2,3},{2,2,1,3,4},{3,5,6,4,5},{4,1,3,2,6}};
// output matrix - 6x6
long sat[height][width];

void sat_matrix(){
 // formula variables
      int a=0, b=0, c=0, m=0;
      // matrix traversal loop for calculating the SAT
for(int i = 0; i < height; i++){
for(int j = 0; j < width; j++){
                     // following code picks up array elements within bounds and picks "zero"
                     // for values outside bounds.
a = (i-1>=0)?sat[i-1][j]:0;
b = (j-1>=0)?sat[i][j-1]:0;
c = ((i-1>=0)&&(j-1>=0))?sat[i-1][j-1]:0;
m = ((i-1>=0)&&(j-1>=0))?matrix[i-1][j-1]:0;
                      // ACTUAL FORMULA FOR SUMMED AREA TABLE
sat[i][j] = m + a + b - c;
}
}
}

PROCEDURE:
Use the function as it is and write supporting code. The code written was compiled using g++ compiler in cygwin environment

OUTPUT:


Tuesday, June 26, 2012

Summed Area Table



Definition
Summed Area Table is both an algorithm and data structure used in reference with the concept of Integral images. SAT is a name used to refer to both the method and the result of conversion of an image to an integral image.

SAT – The Algorithm
Summed Area Table (or Integral Image) is an algorithm applied on a 2-dimensional array of elements. It’s a simple, single pass algorithm to obtain the integral image values from the given pixel values of image.

SAT – The Data Structure
SAT also refers to the table of values generated after applying the conversion to Integral Image. This table of values is then used as input for improving the speed of more complicated operations.


Procedure
The algorithm takes as input, a table of order nxn and returns a table of order (n+1)x(n+1). The fundamental operation done here is to apply the following formula;


I(x,y) = i(x,y) + I(x-1,y) + I(x,y-1) - I(x-1,y-1)
Where, 
  i(x,y) = Element of image array i[x][y]
  I(x,y) =  Element of integral image array I[x][y]


The procedure can be better explained by the following code snippet

Integral Image


Definition
An Integral image is one that is conducive to frequent summation operations. Any rectangular subset of such an image can be evaluated in constant time. Such an image is achieved by converting its pixel values to a SAT (summed area table) by simple interpolation of known pixel values.
An Integral Image is defined as,
I(x,y) = i(x,y) + I(x-1,y) + I(x,y-1) - I(x-1,y-1)
Where,
i(x,y) = Pixel value of base image at (x,y)
I(x,y) = Pixel value of integral image at (x,y)

How this helps
Once the integral image has been calculated, the sum of any rectangular subset can be calculated by just 4 array points, i.e.,
SUM = (Bottom right + top left – top right – bottom left)

Application
Despite being a very simple transform (mathematically), calculating integral images helps perform many complex calculations with ease. Processing an integral image is much simpler than processing a normal pixel table. Hence the prerequisite step of several image processing algorithms is the conversion to an integral image.
Integral Images are useful for calculating HAAR wavelets, gradients, means and other measures. These are also used in image blurring, face recognition and other similar algorithms.
The concept of integral images can be easily extended to continuous domain (using limits) and multidimensional images. Also, the nature of summation operations can be modified to suit the algorithm, such as summation over non rectangular areas.

Method
Integral images are calculated using the following method.

Monday, June 25, 2012

How I Installed CUDA on my PC


Platform: Windows 7 (32 bit)
CUDA Hardware: None yet, will use simulator till then.

1 Prologue
      Before you start, get a clear idea about CUDA and its features from here.

2 Installation
2.1 Software setups
Basic step is to download the latest SDK and to choose an appropriate Toolkit according to your hardware’s compute capability. Download links are given here.
If you don’t have the hardware yet, you will need to use the emulator to compile and run programs (covered here). The emulator was however deprecated in the 3.x updates. Hence you need to download CUDA Toolkit 2.3 from here.

2.2  Installation Steps
Thankfully for windows, no post-installation configuration is required. Just run the setups and install-away.


3      Choose your language: C\C++
3.1 C++
Visual Studio will be used for C++ development for convenience. Express versions can be downloaded and registered for no cost. Or check with your countries IEEE MSDNAA alliance website if you are a member.
(If other, better IDE’s now have compatibility with CUDA; please notify me in the comments)
Downloads:
CUDA VS WIZARD (Win 32) 2.00 (or latest version)
  
3.2 C
C is best used on Linux or a native Linux environment. For windows users, the limited capability offered by the command line is satisfactory in this context. No separate setup is needed as the environment variables are already in place and CUDA’s compiler; nvcc can be invoked from the command line directly.

3.3 Java
I am, as of now, not committed to the idea of using Java for CUDA programs. But for reference, please go to this site.

After these steps have been completed, you are now ready to compile and execute CUDA programs.

CUDA Emulator

The CUDA emulator is a software that duplicates (or emulates) the functions of a computer system with a CUDA-enabled card in a computer system with no such hardware, so that the emulated behavior closely resembles the behavior of the real system. This software package is mainly aimed to empower developers and students who do not have access to Nvidia GPU's.


Initially, the CUDA Toolkit came with an emulator "built into" the CUDA compilor; "nvcc". Later, from versions after 3.0, the emulator was dropped. The last version that supports the emulator is v2.3, and can be downloaded from the CUDA Toolkit archives here.


But thankfully, several third parties have contributed to produce several emulation options that will be listed in this space shortly.

What is CUDA?


The article serves as a prologue for beginners. Please note that this is a primer not a tutorial, tutorials will follow.

CUDA (Compute Unified Device Architecture) is a parallel processing architecture that gives developers the ability to process their applications on CUDA-enabled processors. Basically it provides “us” the ability to process parallely, not just the CPU. The CPU and GPU are treated as separate devices with their own address space and memory. Actual processing is delegated to the GPU via a costly memory transfer between the CPU and GPU. After the job is finished, the result is transferred back to CPU for output to user.

One of the main implications of CUDA is that algorithms can now be split and processed on multiple processing units (called CUDA cores) to achieve excellent performance. This feature promises to add some viability to otherwise redundant or unfeasibly slow algorithms.
CUDA cores are processing units with their own memory as well as access to a shared global memory of the GPU. Each of these “cores” is a powerful processor in itself and can execute many threads collectively. Threads are the smallest execution unit of a program and are created\coded to suit the algorithm being processed.

In terms of software, parallel processed code is written using extensions to C\C++\Java, the favorite among developers being “extended C” due to its simplicity. Other languages can\will support CUDA via separately written libraries (for example: jCUDA for Java).  

In terms of hardware, CUDA needs to be run using CUDA-enabled GPU’s. These devices come with hundreds of CUDA cores, a fundamental requirement to run code written for CUDA. NVIDIA provides a list of all CUDA-enabled GPU’s here.

For a detailed theoretical explanation, you may now move on to Wikipedia (here) and then to NVIDIA Documentation provided with the SDK.