April 14, 2026 Reading time: 4 min
import cuda @cuda.kernel def vec_add(a, b, c): idx = cuda.thread_idx.x + cuda.block_idx.x * cuda.block_dim.x if idx < a.size: c[idx] = a[idx] + b[idx] vec_add[blocks, threads](a, b, c) cuda release news
find_package(CUDA REQUIRED) cuda_add_executable(myapp main.cu) New way (CUDA 13+): April 14, 2026 Reading time: 4 min import cuda @cuda
If you’re building HPC simulations, training LLMs, or optimizing edge inference, here’s what changed, what broke (sorry, legacy Kepler devs), and what to benchmark first. The biggest quality-of-life shift: cuda.compile and cuda.execute are now built into the core driver API. or optimizing edge inference
Old way (verbose, error-prone):