7 1 13

wenhua cheng

wenhuach

wenhuach21

AI & ML interests

Model Compression, CV

Organizations

Posts 2

Post

612

Try to find a better int4 algorithm for LLAMA3.1? For the 8B model, AutoRound boasts an average improvement across 10 zero-shot tasks, scoring 63.93 versus 63.15 (AWQ). Notably, on the MMLU task, it achieved 66.72 compared to 65.25, and on the ARC-C task, it scored 52.13 against 50.94. For further details and comparisons, visit the leaderboard at Intel/low_bit_open_llm_leaderboard.

Post

511

Check out AutoRound, SOTA LLM quantization algorithm across 2-4 bits without adding any inference overhead to any model
paper: https://arxiv.org/abs/2309.05516
github: https://github.com/intel/auto-round
lowbits leaderboard: https://huggingface.co/spaces/Intel/low-bit-leaderboard

Papers 2

arxiv:2310.10944

arxiv:2309.05516

models

None public yet

datasets

None public yet