Related: live demo of DeepSeek v4 Flash running on my 128GB MacBook. Italian lan...

dust42 · 2026-05-02T17:53:55 1777744435

For many models the performance of llama.cpp on Mac is 20-40% lower than MLX. Did you try MLX? At least on HF there are MLX 2-bit quants. Unfortunately I have only 64GB, so I can't test it.

antirez · 2026-05-02T17:55:20 1777744520

I'm not using llama.cpp there, it's my inference engine that is DeepSeek v4 specific. The goal is to optimize it as much as possible.

oveja · 2026-05-07T13:46:15 1778161575

That's cool!

I knew the name sounded familiar, thank you for SDS!