Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Has anybody used V4 hard, for the most challenging tasks (agentically, locally)? It's so hard to compare without putting serious time in it. Like spending a year daily with the model.


I tried it for two tasks using Claude Code, on max effort.

1. Web platform, asking it to analyse a feature to create reports, and coming up with better solution and better UX. it did great, I would say on par with Sonnet 4.6 or even opus considering the thinking and explanation

2. Mac app with some basic functionality, it did well from functional perspective but then I used Opus 4.7 to evaluate and suggest improvements, where I noticed it missed many vital points in design system and usability.

I think it’s a leap, I haven’t used a model this capable that is not OpenAI or Anthropic


Claude Code poisons non-anthropic models in usage. We found this out when the code was leaked. Use a fork or OpenCode/pi-coding-agent


Mind sending where you found this in the leaked code?


By poisons, do you mean it degrades their quality of output somehow?


That's what an evaluation dataset is for, create your own and you can bench a model in a few hours to see if it fits your needs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: