Very well made video about the somewhat mysterious phenomenon of 'grokking' in transformers, where they will quickly memorize all training data in a few hundred steps but still completely fail on testing data, then seemingly not improve at all for thousands of steps before suddenly generalizing perfectly: https://www.youtube.com/watch?v=D8GOeCFFby4
Timeline
Post
Remote status
Replies
3
@lain this was a cool video. I'm looking forward to when we develop a rigorous way to analyze AI models, similar to how we use physics to understand natural phenomena.
at the end of the day, they're very high-dimensional functions, and a lot of the math needed to analyze them probably exists but is so hyper-specialized that few people know about it.
at the end of the day, they're very high-dimensional functions, and a lot of the math needed to analyze them probably exists but is so hyper-specialized that few people know about it.
@genmaicha anthropic has a bunch of research around these topics, they are generally very good https://www.anthropic.com/research
@lain bro has worst desk setup