More

zzleeper · 2026-06-13T20:35:29 1781382929

I'm pretty sure only a small fraction of grants gave this issue, and the cuts have meanwhile being very wide, without any sort of intelligent approach (I know ppl doing stuff like material science at nasa that now have nothing to do because they cut costs of various inputs, while the very expensive lab equipment is sitting there now unused)

zzleeper · 2026-06-13T03:54:47 1781322887

I asked it to tweak the fonts/colors of a very very simple static page and it blew through $35 (which is a lot for me lol; it's 10 days of my monthly codex plan).

rblatz · 2026-06-13T04:14:36 1781324076

You shouldn’t be using Fable for that, that’s Haiku work.

supuun · 2026-06-13T04:35:39 1781325339

I think they were hoping for more advanced model’s more advanced “taste”

shdh · 2026-06-13T04:52:50 1781326370

This is common sentiment amongst many users

user43928 · 2026-06-13T08:29:52 1781339392

I will never understand why people recommend using models with the capabilities of early 2025.

They cannot even move 10 existing lines of code around without breaking it in the process half of the time.

I very much doubt they are up to the task of implementing any sort of plan with a reliability that allows to complete the work faster than writing the code by hand.

zzleeper · 2026-06-12T00:00:10 1781222410

I managed to write one that at least didnt had the font and colors (using 4.5)

Yesterday, I prompted Fable to improve the frontend to make it look different from Claude style, gave detailed examples etc. 15 minutes and $32 dollars (!) later (used cursor lol) it gave me the shittiest more claudiest website ever, basically ignoring everything I asked

zzleeper · 2026-06-10T13:05:44 1781096744

I created pages with Claude before and it's very very obvious when you see one. From the font choice to the color palette, and the style of the boxes. In fact if anyone has an effective prompt that says "please don't make this look like the average Claude page" please post it!

Multicomp · 2026-06-10T14:11:24 1781100684

I've had some luck giving either an example website to ape or listing out a particular era, monkey see monkey do seems to help a bunch.

I've done each of the 3 for side projects below to pretty good effects.

> This website will be run by IE6 and Windows Mobile 6, so use no dependencies, semantic HTML, a 3-pane layout, and only use JS (es3!) where absolutely necessary (and where necessary, put the script at the end of the body).

When I'm not specifically targeting support for retrocomputers I do something like this, then iterate until it looks right.

> Go look at Dokuwiki, django defaults, and common web 2.0 color schemes, use those for UI inspiration. Keep a 3-pane desktop-first layout, but enable mobile responsiveness with media queries. Use semantic html5 and prefer older boring solutions like surgical jquery or htmx-style islands of interactivity where needed, otherwise do not bring in dependencies without my say so.

And finally, if I'm doing a web app that I'm vibing out with the web stack because I want it one-shotted and not trying to do a good rust core with strong ports/adapters API surface for web or native client callers, I do something like this:

> This is a local web app, the frontend, backend, and desktop are all on the same machine. Use naive and simple development patterns that you document the style as you go, pick a boring web framework and use it idiomatically, but remember that some tricks that are intended to keep network round trips down are not as necessary because network penalties are not as bad as real traffic.

Granted, the above I don't like as much, but it does produce more 'modern' looking sites by default.

ageitgey · 2026-06-10T13:15:12 1781097312

Anthropic's own frontend-design skill attempts to do that. You can install it in Claude Code, or you can tweak it to be closer to your own style:

https://github.com/anthropics/claude-code/blob/main/plugins/...

But what I find works best is to point Claude at a design system documentation website (your own company's or another public source) and tell it to use that design style. It usually does OK, and the results are usually much more in line with that style and not as Claude-y.

rlorenzo · 2026-06-10T16:22:41 1781108561

That skill has not been updated since its release.

I would suggest checking out this project for a boost in design skills:

https://impeccable.style

emodendroket · 2026-06-10T14:16:09 1781100969

Not sure that's necessarily any bigger a deal than when every Web site had the "Bootstrap look."

dannyw · 2026-06-11T03:29:49 1781148589

Fable is really, really good at this. My workflow involves giving it a bit of human inspiration, through asking it to generate a few different design templates/scaffolds (without building the whole frontend first).

Then I iterative and give it feedback, point out all the parts I don’t like, sometimes mixing and matching.

I’m sure you can do this with Opus too, but Fable is a better designer.

esikich · 2026-06-10T14:56:56 1781103416

Just give it actual ideas of what you want instead of "make me a web page". Garbage in garbage out.

KellyCriterion · 2026-06-11T07:49:55 1781164195

Though it standard look & feel is not that bad, IMHO.

zzleeper · 2026-06-09T22:43:38 1781045018

It's increasingly obvious that the only safeguard we got is open models and semi open ones like from China. Crazy world

zzleeper · 2026-06-09T17:25:19 1781025919

How credible is this benchmark? does it correlated with others real world experience?

bfeynman · 2026-06-09T17:59:25 1781027965

Given it was made by cognition (team behind devin flop) who now just got to wait out until claude and gpt5 basically do all of the work for them - not very. When you read about it, the framework is highly subjective. Which very quickly becomes a problem because its based on heuristics that probably change a bunch with a better code model.

vanuatu · 2026-06-09T18:02:15 1781028135

the subjective framework is exactly why its good

prior bms relied mostly on unit tests or synthetic judges which are easily benchmaxxed, which leads to nobody trusting benchmarks

we need people manually checking the data for good code quality

vanuatu · 2026-06-09T18:00:26 1781028026

i worked on one of the benchmarks typically found in new model releases

this benchmark looks very good from the methodology. a cog researcher checking the data themselves is very high signal (not scaleable so don't take the benchmark as gospel, but directionally good)

Catloafdev · 2026-06-09T17:29:04 1781026144

It's a relatively new benchmark but from what I can tell it has serious cred behind it. I assume it will be picked up as part of the standard suite of CS-related benchmarks soon enough.

emp17344 · 2026-06-09T17:29:56 1781026196

Seems like it literally popped up yesterday with the express purpose of building hype for this release.

osti · 2026-06-09T18:13:05 1781028785

And notable absence of DeepSWE benchmark where they do badly, but somehow a benchmark that was published yesterday is in this announcement.

zzleeper · 2026-06-09T21:27:27 1781040447

Exactly.. a bit of a red flag for me..

swyx · 2026-06-09T18:46:15 1781030775

team member here - we had been working on frontiercode for ~6-7months. timing just lined up

emp17344 · 2026-06-09T19:46:16 1781034376

Yeah, right. If this benchmark was truly developed in an independent manner, and the timing just “lined up”, how did Anthropic even know to include results in their model release documentation the day after the benchmark is revealed? It seems like there must have been some collaboration or influence from Anthropic behind the scenes.

oblio · 2026-06-09T22:15:18 1781043318

Come on, why are you a jerk about this?

Nobody would have 800+ billion reasons to lie by commission or omission here.

vanuatu · 2026-06-09T17:57:33 1781027853

i doubt it, cog wants coding agents to be better because it directly improves their product

they aren't married to a particular lab, most of their usage is their in house model i believe

anthonypasq · 2026-06-09T17:33:47 1781026427

what incentive does Cognition have for doing this? seems like complete nonsense speculation on your part.

bel8 · 2026-06-09T17:45:35 1781027135

With billions/trillions of dollars floating around, is it hard to imagine benchmarks could be biased?

I think it's safe to assume everything AI related is heavily biased until proven otherwise. Just like in pharma.

camdenreslink · 2026-06-09T18:22:40 1781029360

People game benchmarks for fake internet points to get their favorite web framework to the top of the list. I'm pretty sure they will do it for billions of dollars.

anthonypasq · 2026-06-09T19:04:11 1781031851

you didnt answer my question. Why would cognition be biased towards making anthropic look good?

gloosx · 2026-06-10T07:42:14 1781077334

Because Cognition is a major customer of Anthropic?

anthonypasq · 2026-06-10T15:55:52 1781106952

they are also a major customer of OpenAI and every other model maker. whats your point?

schipperai · 2026-06-09T18:29:17 1781029757

Cognition did well in documenting their approach [1].

TL;DR - they worked with OSS project maintainers to build tasks. They score models based on whether a PR is mergeable. All tasks are graded by a human researcher. SoTA models have hill-climbing to do which raises the bar and inspires confidence. I'd say it's legit.

[1]: https://x.com/cognition/status/2064061031912288715

shimman · 2026-06-10T03:07:03 1781060823

It's an unacademic benchmark by a failed VC startup clawing for relevancy.

CSMastermind · 2026-06-09T21:17:38 1781039858

DeepSWE is the benchmark you want to actually look out for. Only one that aligns with actual user reported results from trying the models.

ryeguy · 2026-06-10T00:14:54 1781050494

Did you read the blog post? They compare to deepswe and call it out as the worst one for false positives (failed, but the benchmark assessed it as correct). It also has less language variance.

CSMastermind · 2026-06-10T05:23:52 1781069032

I mean yes that is what you'd say if you were writing a blog post about your new benchmark.

ryeguy · 2026-06-11T01:42:58 1781142178

Sure, but they at least quantified it with data. It's not like they just dropped a sentence saying the above, they showed numbers.

zzleeper · 2026-05-30T21:51:05 1780177865

Holy F.. $3 .. once I'm done with my base cursor allocation, each nontrivial question costs $5 . And yes, I'm now switching to a mix of codex and ds4pro

zzleeper · 2026-05-29T16:50:41 1780073441

Sorry that's confusing cash flow with profits, where things get amortized

zzleeper · 2026-05-29T16:32:59 1780072379

Let me know if you find one! I'm at a loss. (And even then, if I switch I have to pay $$$ taxes on capital gains)

stouset · 2026-05-29T16:41:57 1780072917

You can sidestep this entirely with a total-market fund like VTSAX/VTI, which hold the entire market and should be more resistant to being gamed.

They’re free-float adjusted so entities like SpaceX are valued only by what’s available on public markets. And Vanguard (and its funds) are owned by its investors, which makes it seem implausible that the rules would be rewritten in a way that would damage investors.

SilverElfin · 2026-05-29T16:43:02 1780072982

VTI lists fast even before these recent changes as I recall. So it’s more vulnerable, not less.

LPisGood · 2026-05-29T16:47:31 1780073251

It may list fast, but it covers many more securities from what I understand so it’s insulated. I think the fact is that any broad market ETF is gonna own at least some piece of a $1 trillion company.

svachalek · 2026-05-29T17:47:57 1780076877

The additional securities it includes are weighted by market cap though. So a total market fund ends up being 80% S&P 500, and even if they add thousands more companies those all fit in the 20% slot.

felixgallo · 2026-05-29T17:15:23 1780074923

well that's the problem, right? There is no justification for a trillion dollar Elon Musk valuation. And he and his investors know this. That's why they're trying to change the rules to dump the stock while it's irrational on every investor in the world. If they really believed in the value of the company, would they be bribing people to scam the index funds?

toomuchtodo · 2026-05-29T17:42:48 1780076568

Indeed, it's like robbing a bank while the bank is holding a party. Except its everyone's portfolios who are invested in the index funds with potential exposure in scope.

High level, it's concerning to observe this unfold while almost every asset class is at its peak and there is no one willing to purchase (office real estate [1] [2], private equity [3], us equities [4], crypto, etc). Late Stage Capital Markets when you've exhausted greater fools available.

[1] Office Real Estate Is Facing ‘a Year of Reckoning’ in 2025 - https://www.bloomberg.com/news/features/2024-12-18/commercia... | https://archive.today/fTPSY - December 18, 2024

[2] Blackstone Is About to Take a 54% Loss on Iconic Seattle Tower - https://www.bloomberg.com/news/articles/2026-05-29/blackston... | https://archive.today/fcA8W - May 29, 2026

[3] https://qqrl.tk/item?id=47049024 (citations)

[4] BlackRock Scales Back Equities After ‘Generational’ Earnings [Peak S&P] - https://www.bloomberg.com/news/articles/2026-05-29/blackrock... | https://archive.today/lMIcH - May 29th, 2026

stouset · 2026-05-29T22:54:35 1780095275

VTI is float-adjusted so it will not treat SpaceX as if it has a trillion-dollar valuation. It will only consider the publicly tradable portion.

_delirium · 2026-05-29T16:35:47 1780072547

Any of the direct indexing providers will let you blacklist individual stocks from the index. The intended use is to exclude stocks you hold elsewhere (or receive as stock grants) to avoid causing wash sales, but it can also be easily used to make a custom "S&P 499".

zzleeper · 2026-05-29T16:42:45 1780072965

I'm looking at Schwab (and saw a few others) and couldn't find anything: https://www.schwab.com/learn/story/primer-on-wash-sales

I would assume this is not an ETF but sth else?

zparky · 2026-05-29T16:51:55 1780073515

https://www.schwab.com/direct-indexing

zzleeper · 2026-05-18T20:32:50 1779136370

Same. Whenever I see a PE acquisition, I immediately shift my purchases (eg namecheap last year)