Accelerated computing is replacing CPU only computing. NVIDIA builds one of the worlds most powerful parallel processing GPUs. CUDA – gives programmers the option to leverage the massive parallelism potential of NVIDIA GPU’s in common programming languages including C, C++.

Introduction to CUDA – Read Here

Writing Hello World for CUDA:

In our example, we would be using C/C++ as programming language. .cu is file extension for CUDA accelerated programs. You would need a system equipped with CUDA capable GPU to run this example. You would need to configure your system to run CUDA.
More information about system configuration can be read here.

Example Program

In the above program, __global__ keyword indicates that the following function would run on GPU.
Code invoked on CPU is known as host code, and code invoked on GPU is known as device code.

Function_name <<< No.of.Blocks, >>>(); – used to call function to be run on GPU.
Threads are individual processes to be run on GPU. Group of threads are known as blocks. For example, you can group 10 threads into a single block. Then <<<1,10>>> would invoke the function for 10 times. <<<1,10>>> means that you are asking the function to run on 10 threads. <<<10,10>>> would mean that you are creating 10 blocks, each with 10 threads. Then the same function would run for 100 times.
The function cudaDeviceSynchronise() tells the host code to wait until the device code has been executed. Without this synchronization, the host code would run regardless of it device code has completed its process or not.

The above code is to run basic HelloWorld on CUDA. Try it yourself.

To compile the program, we would use the NVIDIA CUDA Compiler nvcc, which can compile both host and device code.

To compile, run the following command on Command Prompt

nvcc -arch=sm_75 -o out -run is the file to be compiled. arch indicates the architecture to compile the file for. In my case, I am using an GPU with Turing architecture. Check your architecture code here.

-run executes the code after its compiled.

More CUDA series posts to be added soon. Stay tuned.

GitHub Repo – arul2810/CUDA (

Read More

CUDA Series

Leave a Reply

Your email address will not be published. Required fields are marked *