Nice post! You piqued my curiosity, so after a bit of research it turns out that, with techniques like MTP/MLA/CSA, it's quite probable that these models are much more efficient (and maybe bigger? tho 400B sounds about right) than a simple RAM breakdown would suggest.
These techniques are used by DeepSeek, and work well with the commodity (NVIDIA) GPU's they use.
Google designs their entire AI stack from the custom silicon up. So they have different optimization approaches.
(Though Gemma does use MTP)
That's me with Google Antigravity. Switching back to vscode was such a breath of fresh air. Porting over my (extensive) settings/extensions/keyboard shortcuts was extremely easy too (just ask the agent to do it), and now I can use both Copilot models and Claude Code easily. More to your point though, the speed and stability is incomparable. I can't remember having many issues with Cursor last year when I used it at my last job, but still, vscode has been surprisingly pleasant for agentic use.
I'm working on Coderbase (https://coderba.se/), a platform for running technical interviews. It started with live interviews cuz that's what I know best, having run over 3,000 interviews in my career, but I made it easy af to run this yourself too. I initially pictured it as a tech-heavy product (and it is), but my second client is a large recruitment agency that's using it both for internal interviews (for recruiters) and external ones (for candidates they're presenting to clients).
I didn't set out to do this. After I got laid off in December, a client quickly fell in my lap: a small startup in the middle of a massive investment round that needed to hire 25 people immediately, with only a CTO available for interviews. I created their content and ran their interviews while building the software at the same time. It started as Google Meet + CoderPad + Calendly and gradually became an in-house system. Unlike Proton (lol), I'm not pretending I built my own video call solution from scratch, it's just an off-the-shelf 100ms integration.
The content is all versioned and structured, which makes it fast to iterate on and easy to reason about. We use major.minor versions and only bump the major for backwards-incompatible changes, or changes big enough that comparing interviews stops making sense. Otherwise, any combination of question versions inside an interview format is considered comparable if the major versions are identical.
The interview itself is highly structured: once you define a format from the content library and the various knobs you can adjust, you can schedule interviews and run them using our integrated "room" (video call + multiplayer code editor, both recorded, with transcripts and playback) and "rubric" (the tool the interviewer uses for content, scoring, and notes during the interview). Once you submit/publish the interview, a report is generated immediately. Example: https://coderba.se/sample
Two interesting AI bits:
- "AI linting": a way to benchmark interview questions by running a candidate model and an interviewer model against each other. The candidate closely follows a defined skills profile, then we compare actual vs expected performance. More here: https://coderba.se/blog/product-update-unit-testing-the-inte...
- "AI draft": once an interview ends, it takes ~30s for the video and transcript to become available. Then we use basically every relevant artifact from the interview, with a PII redaction pass first: questions, scoring, incomplete rubric, transcript, code editor history. We send that through our LLM gateway, currently mostly using DeepSeek because the quality/value is insane, though I may switch to Mistral to stay on the better side of privacy. It sends back recommended scoring + writeup, which we present as Cursor-like suggestions you can accept/reject/edit.
Hey HN, wanted to share an early peek at what I've been building over the past few weeks.
I've spent the last 7 years conducting over 3,000 technical interviews. Along the way, I made the (utterly insightful) observation that most hiring processes are broken, often relying on vibes and ad-hoc trivia. Coderbase is in large part my attempt at treating hiring and interviewing as a rigorous systems engineering problem rather than a simple outsourcing of the thick side of the hiring funnel.
I'm essentially attempting to distill thousands of hours of interview data into sensible processes that can lead to meaningful improvements to hiring. The platform underneath it all is early, but we're getting close to being able to scale the system to e.g. 1000 interviews per week without (hopefully) breaking a sweat.
I plan to publish these weekly, as working on this technical challenge is exhilarating (I already have enough material for a second post but it's got to wait until next week). For now, let me know what you think of my attempt to weaponise AI slop (aka "synthetic interview benchmarking").
Otta in the UK (now eaten by the inexplicably-named Welcome to the Jungle) used to have a very involved vetting process during company onboarding, and I could verify that it was a great service as both a candidate and a hiring manager. To replicate what you want ("every listing is verified") there's no silver bullet but a good vetting process like that goes a long way.
Another site I like is cord.com, which seems to prioritise companies where recruiters are active on that website, I've had a good experience with that one as well, as you get to chat with an actual recruiter in a matter of hours or days.
Quite chuffed someone else mentioned Djokovic, who is close to 39 and just played an Australian Open final. (Yes he got lucky with 2 freebies but he _did_ beat Sinner in the semifinal fair and square, and managed to win the first set before running out of juice)
He probably means when I took VC funding in 2019 and started to rip apart the framework to try build a platform and business. The 2-3 years after were very chaotic.
My goal was never to serve the community but instead leverage it to build a business. Ultimately that failed. The truth is it's very difficult to sustain open source. Go-micro was never the end goal. It was always a stepping stone to a platform e.g microservices PaaS. A lot of hard lessons learned along the way.
Now with Copilot and AI I'm able to go back and fix a lot of issues but nothing will fix trust with a community or the passage of time. People move on. It served a certain purpose at certain time.
Note: The company behind connect-rpc raised $100m but for more of a build system around protobuf as opposed to the rpc framework but this was my thinking as well. The ability to raise $10-20m would create the space to build the platform off the back of the success of the framework.
Obligatory "this is why I love HN" but even for that standard, this is is an incredibly open account, thank you for the insight and sorry it hasn't seemed to pan out quite how you hoped. Still sounds like you got your bag, built something cool, and have your "micro" share of Internet legacy, so not too bad eh?
I've interviewed 3k people with Karat as a professional interviewer, and several hundred more as a hiring manager. The very few times I received direct emails from candidates attempting to circumvent the normal process were met with unequivocally negative reactions. First, I find the Internet sleuthing they'd undergo to find my email address a bit creepy – for example, Karat would only show the first name and profile pic for your interviewer. But more importantly, the sheer audacity to go for such a stunt would firmly anchor them in the box of people I'd never want to work with. I'd still be polite and professional to a fault, of course, but I'd never seriously consider them past that point.
But in bad times where "the normal process" can't even let you have a human look at your resume, it's different. "circumventing" is at worst a simple act of rebellion to annoy people who can change their process. It's a best a chance to actually get the response that isn't even granted with a form rejection these days.
reply