CUDA and NVCC

#BasicsNode [[Grids & Blocks & Threads]] [[Basics of Coding in CUDA]] The majority of this text is taken from the NVIDIA docs, section 3. **CUDA:** Compute Unified Device Architecture **NVCC:** Nvidia C Compiler CUDA is an extension of C, and therefore cannot be compiled by a C compiler. Instead, it is compiled using NVCC, which compiles both device-specific and host-specific code. Since the executable will be run on multiple processor types (CPU and GPU) and contains instructions for both processors, it is called a "fat binary". 1. **Device Code:** the program is first preprocessed for device (GPU) compilation into CUDA binaries (`cubin`) and/or PTX intermediate code. They are then placed into a `fatbinary`. 2. **Host Code:** The the program is then preprocessed a second time for host (CPU) compilation. This is where CUDA specific C++ extensions get translated into standard C++, and embedded into the `fatbinary`. 3. Whenever the host program launches device code, the fatbinary is inspected and a fatbinary for the current GPU is obtained at runtime. Sources (excl. book): https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html