Hi HN,
I’m Padam, a developer based in Dubai.
Over the last 2 years I’ve been experimenting with the idea that AI inference might not require GPUs.
Modern LLM inference is often memory-bound rather than compute-bound, so I built an experimental system that virtualizes GPU-style parallelism from CPU cores using SIMD vectorization and quantization.
The result is AlifZetta — a prototype AI-native OS that runs inference without GPU hardware.
Some details:
• ~67k lines of Rust
• kernel-level SIMD scheduling
• INT4 quantization
• sparse attention acceleration
• speculative decoding
• 6 AI models (text, code, medical, image,research,local)
Goal:
make AI infrastructure cheaper and accessible where GPUs are expensive.
beta link is here:
https://ask.axz.si
Curious what HN thinks about this approach.
I’m Padam, a developer in Dubai.
Over the last 2 years I’ve been experimenting with running LLM inference without GPUs.
AlifZetta is a prototype AI-native OS that virtualizes GPU-style parallelism from CPU SIMD.
Key details: • ~67k lines of Rust • kernel-level scheduling • quantized inference • 6 AI models
Curious what the HN community thinks.