Wow, not one mention of the env vars that have a far greater influence on how the models actually work under the hood - https://code.claude.com/docs/en/env-vars
Very important for bedrock deployments and other not-as-standard deployments
Key for how I've deployed it - disable adaptive thinking, max thinking tokens, disable telemetry, etc
GPT-5.5 is a solid leap with Codex or other harnesses. Opus 4.7 I still don't understand how people use... I tried it for a day or two, have tried it for a few hours every week or so since release, and still use 4.6 as daily driver (with xhi thinking).
As with these daily opinion threads, ymmv. I find GPT's code to be competent, but its voice isn't great. If Claude can be a little too cool, GPT-5.x often reads like 90s era movie hacker technobabble. This has got to be RLHF/alignment and the sort of tone that people like. Also anecdotally I used xhigh for a while and turned it down to medium because it would take so long to do even simple jobs. The instruction following is quite good with 5.5 so there isn't too much need to let it wander off.
It could be an overflow but related with the frequency at which the register was increasing, rather than the max value of te register. E.g. +1 this uint16 (65535) once every 500,000 cycles on this 32 Mhz chip, that previously was a 1 Mhz chip and never had a problem.
apparently you can straight up duplicate/add/rearrange layers without changing any of the weights and get better results as well - https://dnhkng.github.io/posts/rys/
> This is probably due to the way larger numbers are tokenised, as big numbers can be split up into arbitrary forms. Take the integer 123456789. A BPE tokenizer (e.g., GPT-style) might split it like: ‘123’ ‘456’ ‘789’ or: ‘12’ ‘345’ ‘67’ ‘89’
xVal basically says "tokenizing numbers is hard: what if instead of outputting tokens that combine to represent numbers, we just output the numbers themselves, right there in the output embedding?"
It works! Imagine you're discussing math with someone. Instead of saying "x is twenty five, which is large" in words, you'd say "x is", then switch to making a whistling noise in which the pitch of your whistle, in its position within your output frequency range, communicated the concept of 25.00 +/- epsilon. Then you'd resume speech and say "which is large".
I think the sentiment is that today's models are big and well-trained enough that receiving and delivering quantities as tokens representing numbers doesn't hurt capabilities much, but I'm still fascinated by xVal's much more elegant approach.
There have been a couple "studies" and comparing various frontier-tier AIs that have led to the conclusion that Chinese models are somewhere around 7-9 months behind US models. Other comment says that Opus will be at 5.2 by the time Qwen matches Opus 4.5. It's accurate, and there is some data to show by how much.
Personally was always a fan of just going with the largest fans possible - surprised we don't see more cases designed around 140mm and larger. 200mm is much less common but has a more pleasing noise profile
I'm also a fan of that sort of setup. A Fractal Meshify 2 XL will fit a bunch of 140mm fans, or you can get the Torrent which is smaller but has 2x 180mm fans up front. I have both and would recommend them, though the Torrent is a tight fit for a big board, and the shield on the back of the Asus W790 motherboards interferes with the cable routing grommets on the motherboard tray, so you have to remove them.
Noctua makes really good fans, I'm told. Want to get on their level and make a similar amount of money? In a world of slop, quality engineering is valuable.
Using this with tmux and various VPN tech. Main issue is scrolling. Termius + tmux don't scroll very well. And I've been led to believe tmux is necessary to keep sessions open when I turn off my phone screen
Scrolling is quite jenky with Termius - I thought there's a way to keep sessions going when there are intermittent drops in connection via Termius, but for how I've been building, when I lose connection I just restart claude and reexplain the context of the task.
I’m the Blink developer and really curious. Which way do you think Blink is inferior? I think the ssh/mosh tooling is way more powerful, keyboard config, etc. But would love to improve what I can. Currently working on new UI and better access to hosts, so it is a good time.
Sorry to have missed this. First, props that Blink is indeed superior in that it actually works.
My recollection is that the primary way Termius "feels better" is the pre-terminal UI (configuring hosts, etc).
But I'm still a daily Blink user, so I don't 100% recall what Termius did to make me think it was nicer, I just remember that scrolling was fucked so I gave up on it.
Thanks! How I eventually found it was stripping stuff back layer by layer. And by that I mean I started with the raw camera feed and got to where things worked well in a different swift view. And then from there, peeled stuff back from the main process feature by feature. And then bam, aircraft were exactly where they should be (minus the compass inaccuracy). I even had stuff like drawing mountain peaks (I live near Denver) as "aircraft" to figure things out, determining different FOV at different zoom levels (a lot of AI keyed in where the boxes would move in one direction at low zooms, be completely correct at some middle zoom, and then in the opposite direction at high zoom).
And that peeling back was me looking at each function to see what it did (I am a dev, but not for SwiftUI). So yep, can't vibe code it all!
Very important for bedrock deployments and other not-as-standard deployments
Key for how I've deployed it - disable adaptive thinking, max thinking tokens, disable telemetry, etc