Has anybody used V4 hard, for the most challenging tasks (agentically, locally)?...

Oras · 2026-05-02T10:57:42 1777719462

I tried it for two tasks using Claude Code, on max effort.

1. Web platform, asking it to analyse a feature to create reports, and coming up with better solution and better UX. it did great, I would say on par with Sonnet 4.6 or even opus considering the thinking and explanation

2. Mac app with some basic functionality, it did well from functional perspective but then I used Opus 4.7 to evaluate and suggest improvements, where I noticed it missed many vital points in design system and usability.

I think it’s a leap, I haven’t used a model this capable that is not OpenAI or Anthropic

kroaton · 2026-05-02T12:49:31 1777726171

Claude Code poisons non-anthropic models in usage. We found this out when the code was leaked. Use a fork or OpenCode/pi-coding-agent

Oras · 2026-05-02T13:26:55 1777728415

Mind sending where you found this in the leaked code?

swader999 · 2026-05-02T13:01:01 1777726861

By poisons, do you mean it degrades their quality of output somehow?

segmondy · 2026-05-02T14:35:03 1777732503

That's what an evaluation dataset is for, create your own and you can bench a model in a few hours to see if it fits your needs.