These benchmarks means very little. The real test is model + harness so agentic ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

syntex 33 days ago | parent | context | favorite | on: Kimi K2.6 just beat Claude, GPT-5.5, and Gemini in...

These benchmarks means very little. The real test is model + harness so agentic system that can fulfill given goals.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact