Outcome-Based Reinforcement Learning to Predict the Future

ctoth · 2025-05-27T16:14:10 1748362450

Do you want paperclips? Because this is how you get paperclips!

Eliminate all agents, all sources of change, all complexity - anything that could introduce unpredictability, and it suddenly becomes far easier to predict the future, no?

JoshTriplett · 2025-05-27T16:17:35 1748362655

> Do you want paperclips? Because this is how you get paperclips!

Don't^W worry, there are many other ways of getting paperclips, and we're doing all of them.

sitkack · 2025-05-27T18:06:42 1748369202

Even explaining how not to get paper clips, gets you paper clips when you can invert the loss function. Paper clips for everyone!

vlovich123 · 2025-05-27T17:35:29 1748367329

I don't know. Paperclips are awful useful. Would it be so bad to build more of them?

throwaway71271 · 2025-05-27T17:38:15 1748367495

https://www.decisionproblem.com/paperclips/index2.html go ahead :)

Ygg2 · 2025-05-27T21:10:30 1748380230

That's all fun and games until paperclip maximizers starts looking at your blood as source of iron.

valine · 2025-05-27T18:26:58 1748370418

So instead of next token prediction its next event prediction. At some point this just loops around and we're back to teaching models to predict the next token in the sequence.

lumost · 2025-05-27T18:29:55 1748370595

Tokens are an awfully convenient way to describe an event.

phyalow · 2025-05-27T21:27:43 1748381263

Tokens are just discretized state representations.

ww520 · 2025-05-27T20:28:36 1748377716

It’s the next state. So instead of spitting out words, it will spit out a whole movie, or a sequence of world states in a game or simulation.

jldugger · 2025-05-27T22:25:37 1748384737

From the abstract

> A simple trading rule turns this calibration edge into $127 of hypothetical profit versus $92 for o1 (p = 0.037).

I'm lazy: is this hypothetical shooting fish in a barrel, or is it a real edge?

nyrikki · 2025-05-27T23:04:30 1748387070

Note the 'hypothetical profit' part , I know of several groups looking for opportunities to skim off LLM traders, leveraging its limited sensitivity, expressiveness, and the loss of tail data.

Predictive AI is problematic no matter what tool you use. Great at demoware that doesn't deliver.

I am sure there are use cases, but it would be augmentation, not a reliable approach by itself.

amelius · 2025-05-27T21:56:49 1748383009

Why would you use RL if you're not going to control the environment, but just predict it?

TOMDM · 2025-05-28T00:51:42 1748393502

Because they're training a predictor, not an agent?