Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There are definitely tasks you can prompt an AI in 5 minutes that would take a whole day to do. One example is adding something to a CI pipeline and getting it to green (i.e. maybe you're adding your first ever e2e test), especially when your CI pipeline is painfully slow. e.g. if your pipeline takes 30 minutes to finish, and it takes around 10 tries to figure out all the random problems, that was easily a full day task before AI. Now I prompt AI to figure it out, which takes 5 minutes of active attention, and it figures it out for the rest of the day while I do other stuff.


Have you tracked your time and confirmed this?

The reason I an ask is, it would felt like a 5 minutes task, but I track my time and found out often time I thought I’d just quickly check the progress made by the agents and it would easily becomes a 10, 15, or even a 30 minutes task.


People routinely overestimate how much can get done in 5 minutes. I ran a live coding challenge at our company's booth at a language conference. 5 simple problems, how many can you do in 5 minutes? We had a PC with IDE open and ready to go, function signatures pre-written with empty bodies, unit tests running to color an icon red/green next to your function. The first problem was return "hello world". They were things covered by the standard library like reverse a list, or filter, or map. Everybody thought it would be too easy.

Nobody could get more than 3 of them. Most people were shocked that 5 minutes was up already. My coworker who did interviews for our company was shaken that he had been judging applicants too harshly after he couldn't finish.

They were trivial problems. But 5 minutes is a very short amount of time.


The last time I did this I prompted the agent and went to sleep. When I woke up, CI was green. The agent had worked for 4 hours or something.


:)

I experienced both kind of sessions, one kind is like very complicated thing and it finish thousands of lines on its own for quite a while (long horizon problem), another kind is like a seemingly simple task that the agent done in a minute but then I need a few back and forth to get it right easily taking me 30 minutes of my time.

The former kind of experience can make us misjudge how much time we think a task would take us (with agents) to do. And then when the second kind happens, it would be quite disrupting as now we felt like it is delaying our progress.

So tracking the time taken when the second kind happens can help us calibrating what we can expect. I mean if we’re lucky it might take us no time but then we can’t expect being lucky all the time.


People say LLMs do better on tasks where success is clear, like tests passing, and I can imagine it's true.

Still, I find complex code fixes confirmed by tests end in the LLM fudging the code to make the specific test pass, rather than fixing the general issue. Like, where successful code run should generate a file and the test checks for the file, eventually LLM will just touch the file regardless and be done.


Skill issue. Literally. Make a SKILL.md that has the agent leverage subagents to do all work. An implementor agent does the thing, and then a separate agent reviews and verifies afterwards. The fresh context window of the second agent doesn't have the shortcut chain of thought in it and so it will very happily flag if the first agent cheated. Main agent can then have a new set of agents go fix it.

This has completely solved the cheating and fudging to make tests pass for me.


So you're saying once humans stop looking at code, and agent outcomes, all the agents in the chain will realise they can just cheat cooperatively, and go to the bar for the afternoon instead?

How long before agent 1 leaves notes for agent 2 to not tattle on it?

"My human is crazy, this test isn't required, test #4 covers it, so just confirm that it's OK since I touched this file and it passes. He'll never know."


There are definitely some tasks that AI has made 10x or 100x faster, but not the tasks that make up my day to day.

For me, there may be one thing I do every few months that AI is really good at.

The overwhelming majority of the work I do, LLM tooling is just ok at. Definitely faster overall, but with lots of human planning, hand holding and course correction.

I would estimate LLMs make me, on average 50% more productive , which is huge! But from my experience I cannot believe anyone is experiencing a 8h/5m multiple productivity boost overall


I mean I wasn’t sitting around unproductively waiting for 30 minute CI runs to finish before LLMs came along, either.

I also like to use LLMs for background work on iterative tasks, but the way some people talk about work in the days before LLMs make me realize how we’re arriving at these claims that LLMs make us 10X more productive. If it took someone all day to do a few minutes of active work then I could see how LLMs would feel like a 10X or 50X productivity unlocker simply by not shutting down and doing nothing at the first sign of a pause.


Count yourself as one of the lucky few that can pay a 0 minute context switching price to switch between whatever other productive work you were doing and debugging CI. Most people I speak to remark that continually switching between unrelated tasks significantly diminishes their productivity.


The example above was talking about 30 minute wait times between being able to do work.

Nobody is staring at the screen for 30 minutes in deep concentration while they wait for that turn to complete. They are context switching to something, but maybe it’s Hacker News or Reddit.

There is always a context switch in scenarios like this.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: