Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What LLM do you guys use for fast inference for voice/phone agents? I feel like to get really good latency I need to "cheat" with Cerebras, groq or SambaNova.

Haiku 4.5 is very good but still seems to be adding a second of latency.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: