How do you run it? vllm? llama.cpp? Can you share some parameters you enable too...

jyap · 2026-04-30T16:24:14 1777566254

I run it with Llama.cpp on my RTX 3090. Also using the same Unsloth model.

I need to try out some of the other set ups mentioned in this repo for increased TPS.

v3ss0n · 2026-05-05T08:36:34 1777970194

Both. I usse modified jinja template that optimized toolcall , tested on production , none of them works.

Both 27b and A3B done all my production works pbeautifuly (At Q8) i dont think any model are good for Q4.

Qwen 3.5 122b surpasses both of them tho.