Here’s a load test where they run 4 models in realtime on same device:
- Qwen3-TTS - text to speech
- Parakeet v2 - Nvidia speech to text model
- Canary v2 - multilingual / translation STT
- Sortformer - speaker diarization (“who spoke when”)
https://x.com/atiorh/status/2027135463371530695
Here’s a load test where they run 4 models in realtime on same device:
- Qwen3-TTS - text to speech
- Parakeet v2 - Nvidia speech to text model
- Canary v2 - multilingual / translation STT
- Sortformer - speaker diarization (“who spoke when”)
https://x.com/atiorh/status/2027135463371530695