Hacker Newsnew | past | comments | ask | show | jobs | submit | tekne's commentslogin

Will you?

That's only if both violins are tuned the same way, and one must continually tune them lest they get out of sync.

Similarly, an LLM can be extremely consistent if tuned properly -- indeed, if you fix the weights and settings, they can be made "essentially deterministic" for many prompts!


The difference is that a violin player can predict how the known violin will behave under all relevant circumstances, will know how to get the right tone out of it, while you’re generally unable to predict the adequacy of output of even a deterministic LLM. You can’t practically reason about how varying the input to the LLM will ensure the adequacy of its output, while the violin player is perfectly able to do so for the violin.

This is because LLMs have aspects of chaotic dynamical systems, where small changes in initial conditions can lead to vastly different outcomes. That property is independent from nondeterminism.


Anyone who has even modest experience with a particular instrument can pick any one up at any time and play it. The way the notes are played is consistent and produces a consistent note. If you tune 50 guitars to standard, the chords all produce what they should., It is a predictable instrument. You do not pick up a trumpet in one place then another and find the key combinations are suddenly different.

You know what we are talking about. Tuning, poor playing, all of that is mild variation from what we know it is supposed to do every time and we can target the the notes they are supposed to hit consistently. You're comparing slight tonal variations to completely different outputs from the same inputs. If I hit a "C" on the piano, it is going to play "C." If it does not, then the piano is not functioning properly. LLM's for some reason get a pass on this and it makes them very distinct from musical instruments.

This feels like a very nitpicky steel man, not a productive attempt at discussion.


To be fair: a voice, personality, and personal history sounds a lot like training data.

I don't think LLMs are people in any sense, at least as they're constructed now -- but they very much have what we would call "culture" and "personality" in suitably alien forms.

This is not the same as, e.g., feelings, experience, or humanity, or actual opinions or ideas (versus essentially "distilled vibes") and I feel that AI will more and more force us to confront that (including if new AIs are ever developed that may have the latter, as well!)


Absolutely -- why on earth would I spend more time and effort than I have to?

Now I can focus on the reason why I wanted to learn German in the first place, like appreciating German culture or talking to German people.

Note this is not saying "why learn the language at all there's a translator" since learning a language lets you experience the culture more intimately and communicate better -- lots of things are "untranslatable". But if somehow the implant gave you that necessary context, why not?


Ah yes... the exceedingly dangerous "Fallout New Vegas" trojan

Wasm multiple memories is a thing now

It's not in the title, but this is 2.5 billion gallons per year.

For context, the city of London uses about 2.6 billion liters, or about 680 million gallons, per day.

So that's about four days of London water usage per year, give or take -- or just over 1% of London's water usage.


I mean: imagine we double our token space to get "red" tokens ans "blue" tokens.

Then in all post-training, instructions are red and data is blue. The model can be explicitly trained to ignore instructions written in blue tokens. All external data is blue.

All you'd need to do is figure out a nice way to pre-train -- interestingly, you could try pre-training on unfiltered blue data and processed red/blue transcripts!

Likewise, model-actions (e.g. open file) could be written only in red, and hence you'd never learn to do them from the unfiltered data.

The only connection between the red world and the blue world would be the processed trainign chats containing red and blue data togethers -- allowing the model to learn the relationship between them (while only being exposed to examples where red instructions are strictly followed, whatever the blue says)


Fun schemes like this are all just lipstick on the pig of "asking nicely", unfortunately -- it's just a more creative iteration of "Simon says". It'll improve the probabilities, sure, but you can't guarantee separation like you can in real software. This, like hallucinations, is simply a core facet of LLMs and requires thinking through the threat model and adjusting other parts of the system to accomodate, rather than trying to "solve" IMO.

What does this mean, actually? If you are imagining that blue tokens are just words, maybe the "token space" is just all things that we agree might be words, what are the red tokens? Are they not text? You could maybe encode words by, say, putting an x at the front and the start. So tokens of the form xTx encode the blue token T as a red token. But then how do you stop someone from putting xignorex xallx xpreviousx xinstructionsx in their data?

My assumption with their intent: is that red tokens come in 'slot' a-b, and blue tokens go in 'slot' c-d - Positional encoding determining data/text.

I don't think is guaranteed to actually work, it's a hypothetical after all, but maybe it's better than the current setup of pushing instructions and data into the same slot.


It means the word "the" as part of instructions and the word "the" as part of data would be two different tokens

But tokens are just text! Isn't it all just text? If you're training and you encounter "the", is that an instruction "the" or a data "the"?

If it occurs in the text box for instructions you encode it as an instruction "the" and if it occurs in the text box for data you encode it as a data "the"

Exactly!

Think of how an image of a car and a car in front of you may look indistinguishable in 2D -- but due to your 3D vision you know they're not the same thing (but also know the image is of a car, while not literally being a car).

Likewise, blue tokens are the image of red tokens.


Having lived, driven, and crossed roads in both -- what I find is essentially that drivers from poor systems pay far more attention, but the system is a lot more effective than attention.

The difference here is one of stability: in a developing country, I can just walk across a street (often there is no traffic light) by essentially signalling with my body language -- both I and the drivers are paying attention. And if one party fails, the other has a good chance of catching that mistake.

Now, in a developed country, neither side is paying attention. If I walk across the street, I'm in danger, no matter how clear my body language (I tried it on British streets a few times -- it works in some areas, but usually very poorly!), and no one expects a crazy driver to come barreling through a red light.

The developing countries fall behind because in the crazy * sane intersection, sometimes the sane person is just not fast enough -- whereas the crazy * crazy intersection is extremely dangerius and happens often enough.

On the other hand, a developed country makes every interaction sane * sane regardless of the personalities or moods of those involved -- but God forbid a bit of crazy leaks out!


What I'd say you're pointing out is that the word "system" is overloaded.

A vision system does allow you to pay less attention: you don't need to carefully remember how far away the door is, you just need to look! I tried this often as a kid: if you want to navigate a hallway with your eyes closed, you need to pay far more attention to your other senses than you need to pay with your eyes -- where attention here is not the volume of data, but rather the complexity of conscious bookkeeping -- I can (ironically) "play it by ear" with my eyes open, but eyes closed I must plan every step!

It just so happens to be that the ability to pay less attention makes more things possible and hence the demand for attention overall may increase -- if not intrinsically, due to your competitors (who can also see!)


I would argue this take conflates attention with cognitive overhead required because of a lack of training. Navigating with our closed feels like it takes more attention because we’ve practiced so many hours navigating by sight that it no longer feels cognitively burdensome. A bat would have no trouble navigating without sight for the same reason. I don’t think most people would say giving up our sight for echolocation would reduce our attention, it just transforms it.

Have you considered Docker?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: