Trending

See what the GitHub community is most excited about today.

HigherOrderCO / HVM

A massively parallel, optimal functional runtime in Rust

Cuda 8,842 323 Built by

809 stars today

karpathy / llm.c

LLM training in simple, raw C/CUDA

Cuda 18,923 1,988 Built by

30 stars today

NVIDIA / nvbench

CUDA Kernel Benchmarking Library

Cuda 423 59 Built by

0 stars today

brucefan1983 / CUDA-Programming

Sample codes for my CUDA programming book

Cuda 1,373 301 Built by

2 stars today

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 671 54 Built by

3 stars today

NVIDIA / cub

[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl

Cuda 1,651 443 Built by

0 stars today

NVIDIA / CUDALibrarySamples

CUDA Library Samples

Cuda 1,262 276 Built by

2 stars today

nerfstudio-project / gsplat

CUDA accelerated rasterization of gaussian splatting

Cuda 882 88 Built by

1 star today

rapidsai / raft

RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.

Cuda 625 173 Built by

1 star today

NVIDIA / nccl-tests

NCCL Tests

Cuda 686 211 Built by

1 star today

66RING / tiny-flash-attention

flash attention tutorial written in python, triton, cuda, cutlass

Cuda 44 3 Built by

1 star today