Prateek Shukla
Prateek Shukla
low-level GPU systems · runtimes · kernels

Building below the framework layer.Where latency and hardware control matter more than abstraction comfort.

CUDA · PTX · Hopper · Blackwell · Inference Runtimes

Telos is my attempt to build a runtime where the machine is treated as the subject, not as a backend to hide behind.

I  ·  The Cathedral

The Machine

substrate · ground truth · ceremony of names
II  ·  The Runtimea serving runtime, opinionated about the metal
Telos
multi-gpu llm serving · blackwell-only

Telos is intended to become a high-performance, multi-GPU LLM serving runtime focused only on Blackwell — powered by Telos-owned inference kernels fed with low scheduler, KV cache, metadata, sampler, and result overhead.

KernelsKV CacheSchedulerSamplerGraph buckets
Latency is the architecture. Everything else is a convenience that pays rent in microseconds.
III  ·  The Primitive Layerexpose the machine · remove paperwork
Hexel
cuda / ptx primitive library

Hexel is my CUDA/PTX primitive library. Its purpose is to make brutal low-level GPU programming cleaner without hiding the hardware or taking control away from the kernel author. Hexel should simplify mechanics, not semantics.

Expose the machineRemove paperworkNever steal the wheel
IV  ·  The Altar

The Stack

hover a layer to read its purpose
L5Applications
L4Telos Runtime
L3Telos-Owned Kernels
L2Hexel Core Primitives
L1CUDA · PTX · Hopper · Blackwell
L1 — Foundation
CUDA, PTX, Hopper, Blackwell. The ground truth every layer above is in conversation with.
V  ·  The Proof

Receipts.

work in the open · benchmarks · notes