More

ahmadyan · 2026-06-06T02:23:18 1780712598

Little did you know, LC-style question is never about grinding LC. Algorithmic puzzles are one of the few legal ways of measuring candidate's IQ without directly asking. Companies are looking for a way to hire smart people, so they rely on LC as a signal. It can be replaced with any similar signal as well (ranging from how many cats can you ship to ISS to solve blackhole physics.)

bediger4000 · 2026-06-06T03:58:44 1780718324

I might buy that, except for how cheesy the actual questions are.

If you subscribed to the old "Daily Coding Problem" email list, you'd know. Those guys collected actual questions asked in interviews ca 2010-2015, and sent them back out. About half were so poorly worded that interviewers couldn't possibly get anything out of them. Some of the questions required zero algorithmic thinking, or there was only one possible solution. Also, getting a flash of physical insight to solve a problem rarely happens when you're in a high-stress situation.

tptacek · 2026-06-06T16:02:01 1780761721

This is one of the most persistent myths in all of hiring. It is not unlawful to test IQ for white collar job candidates. Companies don't use IQ tests because they're not particularly effective, not because they'd get in trouble (reputation aside) for doing so. I don't believe anybody who (a) says stuff like this about Leetcode and (b) works professionally in this industry actually believes they could productively hire off an IQ test.

ahmadyan · 2026-06-03T21:19:09 1780521549

Some of the FAIR people moved to Thinky, and they also started doing encoder-free MM-LLMs. Now Google. This seems to becoming a trend working at small scale, but the difficult part is scaling.

Standard approach for training MM-LLMs is we train the encoder first, there are O(2-10B) good images on the internet, so encoder needs to see each image O(10-100) times, that is O(100T) tokens, which is more than the entire pre-training budget for most runs. That is the reason we train the encoder separately (smaller model, 2B active vs 30B or 200B active LLM); there is nothing magical about training the encoder and LLM together, it is just more token-efficient to train the image modality first.

ahmadyan · 2026-05-30T23:31:04 1780183864

[web.archive.org](https://web.archive.org/web/20260526100726/https://www.nytim...)

ahmadyan · 2026-05-28T18:45:24 1779993924

pretty spot on.

In my experience, Opus 4.0 was fantastic, major jump from 3.7. it was creative, super slow and expensive, and would sometime forget what it was doing, but it was getting the job done.

4.1 they made it much faster, so a lot of infra improvements.

4.5 was the time it could work on longer task, didn't make a lot of obvious mistakes of 4.0, and i think this was about the time the opus went mainstream, and all of the anthropic's compute crisis began, so instead of making the model better they tried to optimize it to reduce cost instead.

4.6 was such a bad model, they switched to adaptive thinking and it had so many bugs. poor api design, benchmaxxed and poor real-world results. i switched back to 4.5.

4.7 they just fixed the bugs they added in 4.6. Better than 4.5.

haven't fully tested 4.8 yet.

sumedh · 2026-05-29T10:55:05 1780052105

> "4.6 was such a bad model,"

It's just amusing reading all these posts with different viewpoints, just in this thread there are multiple people saying 4.6 was so much better than 4.7 and that they switched back to 4.6.

Otterly99 · 2026-05-29T15:20:27 1780068027

I also find it amusing. I also heard a lot of "4.7 is garbage, everybody hates it". Shows you how important proper validation techniques are, not just gut feeling.

ahmadyan · 2026-05-29T16:30:28 1780072228

that is a fair point, everything i said above was in my experience.

* in our experience, in our evals and codebase, 4.6 was a bad model. This is over 60k developers, so statistically significant.

teruakohatu · 2026-05-28T19:57:18 1779998238

I gave 4.6 a miss and only recently switched from 4.5 to 4.7. I found on a particularly different task 4.5 struggled with (getting stuck in loops and trying to convince me the problem had been solved) was quite solvable with 4.7.

ahmadyan · 2026-05-26T03:53:10 1779767590

I feel bad for Jony Ive, no amount of lipstick on a pig is going to save that horrendous car.

ahmadyan · 2026-05-21T20:22:45 1779394965

If you are in the bay area, i'm happy to buy that M3 Ultra from you, i've been unsuccessfully looking for one and can't find any.

ahmadyan · 2026-05-19T20:54:49 1779224089

Google is still very good, just get rid of evil stuff on their homepage to get the old google.

https://www.google.com/search?&udm=web&q=hackernews

ahmadyan · 2026-05-14T23:43:36 1778802216

i'm not sure if i'm hallucinating, but i swear i had codex in the chatGPT app from long time ago (like the original codex on the web).

they added some new stuff, like remote control to wherever the desktop codex app is running, but these companies need to work much more on their press releases.

wahnfrieden · 2026-05-14T23:47:22 1778802442

That was cloud codex. Not comparable

ahmadyan · 2026-05-12T21:22:06 1778620926

yeah, even on product lines that they kill (like Stadia) they usually do right by the user (eg they refunded everyone, both on hardware and software people bought on the platform).

ahmadyan · 2026-05-04T02:21:25 1777861285

In my anecdotal experience, it is not. Same model, opus, works better in 3P harnesses such as Factory Droid or Amp.

Claude code, on the other hand, is the most subsidized one, both for consumers (through max subscription) and for enterprises (token discounts). It is also heavily optimized for cost, specially token caching and reduced thinking, at the expense of quality.

viking123 · 2026-05-04T13:30:41 1777901441

codex is way more subsidized currently, much more generous limits even for 20 dollars a month