Egregoros · Phoenix Framework

Post

Remote status

lain

Very well made video about the somewhat mysterious phenomenon of 'grokking' in transformers, where they will quickly memorize all training data in a few hundred steps but still completely fail on testing data, then seemingly not improve at all for thousands of steps before suddenly generalizing perfectly: https://www.youtube.com/watch?v=D8GOeCFFby4

Replies

Replying to @lain@fediffusion.art

ⰎⰅ ⰀⰍⰨⰒⰎⰫ

@genmaicha@stereophonic.space remote

@lain this was a cool video. I'm looking forward to when we develop a rigorous way to analyze AI models, similar to how we use physics to understand natural phenomena.

at the end of the day, they're very high-dimensional functions, and a lot of the math needed to analyze them probably exists but is so hyper-specialized that few people know about it.

Replying to @genmaicha@stereophonic.space

lain

@lain@fediffusion.art remote

@genmaicha anthropic has a bunch of research around these topics, they are generally very good https://www.anthropic.com/research

Replying to @lain@fediffusion.art

kaia

@kaia@pleroma.soykaf.com remote

@lain bro has worst desk setup