Egregoros · Phoenix Framework

Post

Remote status

Context

Wanderer atop the sea of clouds or whatever

13w

unsloth actually publishes this graph for their quantizations (compressed version of real models). The tl;dr of the vertical is basically how much the quant output diverges from the base model.
This is only divergence from base, and not a predictor of output quality.

This does unfortunately not help me decide if I should pick a 2N billion parameter model at Q2, or an N billion at Q4.

I'd need the benchmarks for the quantized ones.
Or some formula to estimate the bench score drop.
Besides the fact that model makers can game the benchmarks, of course.

RT: https://poa.st/objects/88333ebf-5996-47ee-9f31-e6cc41951d08

Replying to @WandererUber@poa.st

HAPPYPOTAMUS

@HAPPYPOTAMUS@nicecrew.digital remote

13w

FWIW I've had some good runs using the new IQ_2/4 quants over the regular Q_4 I've been using

Replies

Replying to @HAPPYPOTAMUS@nicecrew.digital

Wanderer atop the sea of clouds or whatever

@WandererUber@poa.st remote

13w

imatrix quants are supposedly preserving the model better AT their size than the old ones, yes
But if you look at that graph, unsloth makes iquants that are smaller than the regular ones and thus allegedly perform worse than the bigger ones.

What models and hardware are you running over there?

Replying to @WandererUber@poa.st

HAPPYPOTAMUS

@HAPPYPOTAMUS@nicecrew.digital remote

13w

Just the qwen3.6 one. I have a shitty 3070 8 GB but I can squeeze out some barely acceptable performance

Replying to @HAPPYPOTAMUS@nicecrew.digital

Wanderer atop the sea of clouds or whatever

@WandererUber@poa.st remote

13w

Man you don't know the half of it...

If you got PCIe 4.0 on that thing you should be flying compared to lil old me

I'm glad I've got a shitload of system RAM so I can at least run the bigger models... At 4 t/s