I found some shit, with math.
Being able to theoretically optimize 80b model in 12gb vram. The math checks and transformers are deterministic so this should work.
Stay tuned.
Being able to theoretically optimize 80b model in 12gb vram. The math checks and transformers are deterministic so this should work.
Stay tuned.