unsloth actually publishes this graph for their quantizations (compressed version of real models). The tl;dr of the vertical is basically how much the quant output diverges from the base model.
This is only divergence from base, and not a predictor of output quality.
This does unfortunately not help me decide if I should pick a 2N billion parameter model at Q2, or an N billion at Q4.
I'd need the benchmarks for the quantized ones.
Or some formula to estimate the bench score drop.
Besides the fact that model makers can game the benchmarks, of course.
RT: https://poa.st/objects/88333ebf-5996-47ee-9f31-e6cc41951d08
This is only divergence from base, and not a predictor of output quality.
This does unfortunately not help me decide if I should pick a 2N billion parameter model at Q2, or an N billion at Q4.
I'd need the benchmarks for the quantized ones.
Or some formula to estimate the bench score drop.
Besides the fact that model makers can game the benchmarks, of course.
RT: https://poa.st/objects/88333ebf-5996-47ee-9f31-e6cc41951d08