Quantization Model

#1
by huangleiabcde - opened

Hi, have you ever tried the quantization version of your model? How did it perform compared to llama3-70b Instruct q4? And what's the estimated GPU memory usage if we use Llama-3-Giraffe-70B-Instruct q4 model with an input token of 120k?

Sign up or log in to comment