More

dsmmcken · 2026-05-07T15:02:39 1778166159

The claude provided skill-creator provides a decent jumping off point. It is easy enough to start with, but unless the skill is really simple I found it best to consider it a scaffold for building more tailored evals and reports.

The report leaves out a lot of detail. Several changes I found useful were: Pair with/without on same screen as left/right for easier viewing, token count for skill consumed, token used per run, time, pass rate, estimated cost, detailed aggregate stats, a parsed version of the conversation log (capturing the jsonl with each run, sometimes reading the log is the only way to find out why it's screwing up), work output logging (in my case screenshots and outputted script code), better formatting (syntax highlighting, log formatting).

Finally, I think the most useful thing was adding a self-reflection pass. After an eval is done, another agent looks at everything from that eval and tries to evaluate what went wrong along the way and what should be added to the skill, and conversely, from the without skill run what was in the skill that didn't need to be. It produces a skill change recommendation file for each eval. A further summary agent aggregates up all those recommendations in a way I can feed back to an agent.

dsmmcken · 2026-02-25T00:04:28 1771977868

fwiw, I just tried running the agent-skill they provide for fun to migrate an app-router based next 15 site and the end result is it entirely failed to start.

Vite just hangs when running vinext dev, with no output in logs whatsoever beyond printing`vinext dev (Vite 7.3.1)`.

dsmmcken · 2026-01-22T19:29:36 1769110176

Just curious, about how long did this project take you? I don't see that mentioned in the article.

cannoneyed · 2026-01-22T20:06:34 1769112394

We had our third kid in late November, and I worked sporadically on it over the following two months of paternity leave and holiday... If I had to bet, I'd say I put in well over 200 hours of work on it, the majority of that being manual auditing/driving of the generation process. If any AI model were reliable at checking the generated pixels, I could have automated this process, but they simply aren't there yet, so I had to do a lot more manual work than I'd anticipated.

All told I probably put in less than 20 hours of actual software engineering work, though, which consisted entirely of writing specs and iterating with various coding agents.

mrandish · 2026-01-22T20:42:35 1769114555

> If any AI model were reliable at checking the generated pixels, I could have automated this process, but they simply aren't there yet, so I had to do a lot more manual work than I'd anticipated.

Since the output is so cool and generally interesting, there might be an opportunity for those forking this to do other cities to deploy a web app to crowd source identifying broken tiles and maybe classifying the error or even providing manual hinting for the next run. It takes a village to make a (sim) city! :-)

cannoneyed · 2026-01-22T21:45:06 1769118306

Yeah I'll get the code out there soon - it's just very vibe-y right now, the repo is a bit of a mess since I never bothered to organize things. The secret sauce is really in the fine-tuning, can definitely get those datasets/models public on oxen.ai too

dsmmcken · 2025-12-05T23:42:08 1764978128

144, using css variables, with fallback and p instead of li

https://codepen.io/dsmmcken/pen/WbwYOEQ?editors=0100

p{counter-increment:n;--n:counter(n)}p:nth-child(3n){--f:"Fizz"}p:nth-child(5n){--b:"Buzz";--n:''}p::after{content:var(--f,var(--n))var(--b,'')}

dsmmcken · 2025-09-21T03:31:17 1758425477

Level 48 is the last level, and you get a pdf certificate proving you are human.

dsmmcken · 2025-08-29T02:29:11 1756434551

Two of those wishlists css features already exist as specs:

> n-th child variable

See sibiling-index() and sibling-count() https://developer.mozilla.org/en-US/docs/Web/CSS/sibling-ind...

> Reusable blocks

See @function and @mixin draft spec, https://drafts.csswg.org/css-mixins-1/ and https://css-tricks.com/functions-in-css/

Both are available in chrome already.

dsmmcken · 2025-07-09T13:47:15 1752068835

The tool we use for our docs AI answers lets you mine that data for feature requests. It generates a report of what it didn't have answers for and summarizes them as potential feature gaps. (Or at least what it is aware it didn't have answers for).

People seem more willing to ask an AI about certain things then be judged by asking the same question of a human, so in that regard it does seem to surface slightly different feature requests then we hear when talking to customers directly.

We use inkeep.com (not affiliated, just a customer).

rapind · 2025-07-09T14:56:55 1752073015

> We use inkeep.com (not affiliated, just a customer).

And what do you pay? It's crazy that none of these AI CSRs have public pricing. There should just be monthly subscription tiers, which include some number of queries, and a cost per query beyond that.

dsmmcken · on March 14, 2025

FYI the "watch video" button in the hero of https://codevideo.io/ doesn't work, missing the video ID.

fullstackchris · on March 14, 2025

Great catch - real link shipping to prod as we speak. Thanks for checking out the site!

dsmmcken · on March 10, 2025

And Bloomberg https://www.bloomberg.com/company/press/bloomberg-announces-...

dsmmcken · on Jan 14, 2025

scrollbar-gutter: stable; to those unfamiliar. https://developer.mozilla.org/en-US/docs/Web/CSS/scrollbar-g...

andrewmcwatters · on Jan 14, 2025

Man, this makes me feel old.