Hacker Newsnew | past | comments | ask | show | jobs | submit | dsmmcken's commentslogin

The claude provided skill-creator provides a decent jumping off point. It is easy enough to start with, but unless the skill is really simple I found it best to consider it a scaffold for building more tailored evals and reports.

The report leaves out a lot of detail. Several changes I found useful were: Pair with/without on same screen as left/right for easier viewing, token count for skill consumed, token used per run, time, pass rate, estimated cost, detailed aggregate stats, a parsed version of the conversation log (capturing the jsonl with each run, sometimes reading the log is the only way to find out why it's screwing up), work output logging (in my case screenshots and outputted script code), better formatting (syntax highlighting, log formatting).

Finally, I think the most useful thing was adding a self-reflection pass. After an eval is done, another agent looks at everything from that eval and tries to evaluate what went wrong along the way and what should be added to the skill, and conversely, from the without skill run what was in the skill that didn't need to be. It produces a skill change recommendation file for each eval. A further summary agent aggregates up all those recommendations in a way I can feed back to an agent.


fwiw, I just tried running the agent-skill they provide for fun to migrate an app-router based next 15 site and the end result is it entirely failed to start.

Vite just hangs when running vinext dev, with no output in logs whatsoever beyond printing`vinext dev (Vite 7.3.1)`.


Just curious, about how long did this project take you? I don't see that mentioned in the article.


We had our third kid in late November, and I worked sporadically on it over the following two months of paternity leave and holiday... If I had to bet, I'd say I put in well over 200 hours of work on it, the majority of that being manual auditing/driving of the generation process. If any AI model were reliable at checking the generated pixels, I could have automated this process, but they simply aren't there yet, so I had to do a lot more manual work than I'd anticipated.

All told I probably put in less than 20 hours of actual software engineering work, though, which consisted entirely of writing specs and iterating with various coding agents.


> If any AI model were reliable at checking the generated pixels, I could have automated this process, but they simply aren't there yet, so I had to do a lot more manual work than I'd anticipated.

Since the output is so cool and generally interesting, there might be an opportunity for those forking this to do other cities to deploy a web app to crowd source identifying broken tiles and maybe classifying the error or even providing manual hinting for the next run. It takes a village to make a (sim) city! :-)


Yeah I'll get the code out there soon - it's just very vibe-y right now, the repo is a bit of a mess since I never bothered to organize things. The secret sauce is really in the fine-tuning, can definitely get those datasets/models public on oxen.ai too


144, using css variables, with fallback and p instead of li

https://codepen.io/dsmmcken/pen/WbwYOEQ?editors=0100

p{counter-increment:n;--n:counter(n)}p:nth-child(3n){--f:"Fizz"}p:nth-child(5n){--b:"Buzz";--n:''}p::after{content:var(--f,var(--n))var(--b,'')}


Level 48 is the last level, and you get a pdf certificate proving you are human.


Two of those wishlists css features already exist as specs:

> n-th child variable

See sibiling-index() and sibling-count() https://developer.mozilla.org/en-US/docs/Web/CSS/sibling-ind...

> Reusable blocks

See @function and @mixin draft spec, https://drafts.csswg.org/css-mixins-1/ and https://css-tricks.com/functions-in-css/

Both are available in chrome already.


The tool we use for our docs AI answers lets you mine that data for feature requests. It generates a report of what it didn't have answers for and summarizes them as potential feature gaps. (Or at least what it is aware it didn't have answers for).

People seem more willing to ask an AI about certain things then be judged by asking the same question of a human, so in that regard it does seem to surface slightly different feature requests then we hear when talking to customers directly.

We use inkeep.com (not affiliated, just a customer).


> We use inkeep.com (not affiliated, just a customer).

And what do you pay? It's crazy that none of these AI CSRs have public pricing. There should just be monthly subscription tiers, which include some number of queries, and a cost per query beyond that.


FYI the "watch video" button in the hero of https://codevideo.io/ doesn't work, missing the video ID.


Great catch - real link shipping to prod as we speak. Thanks for checking out the site!



scrollbar-gutter: stable; to those unfamiliar. https://developer.mozilla.org/en-US/docs/Web/CSS/scrollbar-g...


Man, this makes me feel old.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: