Replies: 4 comments 7 replies
-
With the BlackWell we now have native FP4, and so this would be amazing. |
Beta Was this translation helpful? Give feedback.
-
There is a try: #10055 for fp8 For hardware support what I am not sure is the format. there is at least 2 : https://github.com/openxla/stablehlo/blob/main/rfcs/20230321-fp8_fnuz.md |
Beta Was this translation helpful? Give feedback.
-
In an experiment I simply replaced IQ4_NL with NF4 quant and FP4 quant respectively and it worked well. It's as simple as replacing the kvalues LUT! static const int8_t kvalues_iq4nl[16] = {-127, -104, -83, -65, -49, -35, -22, -10, 1, 13, 25, 38, 53, 69, 89, 113}; |
Beta Was this translation helpful? Give feedback.
-
Actually 4 dtypes: |
Beta Was this translation helpful? Give feedback.
-
What about adding low bit floating point data types? Default would be to dequant to fp16.
On top I'm writing an efficient SIMD backend that works in quantized space for the data types mentioned below. Up to 8bit width is supported with my approach.
Especially nf4, nf4dq (double quant) and fp4 would enable lossless conversion from existing 4bit bnb encoded models from HF.
Just want to hear your opinions / interest on this topic. I'll come back with a PR at some point when my backend is ready.
Also, if I missed an important low bit data type just point it out.
nf4
nf4dq
fp8e5m2
fp8e4m3
fp8e3m4
fp6e4m1
fp6e3m2
fp6e2m3
fp5e3m1
fp5e2m2
fp5e1m3
fp4e3m0
fp4e2m1
fp4e1m2
fp3e2m0
fp3e1m1
Beta Was this translation helpful? Give feedback.
All reactions