BramVanroy
/

fietje-2-instruct-gguf

Model card Files Files and versions Community

fietje-2-instruct-gguf / README.md

BramVanroy's picture

Update README.md

42ee8b8 verified 4 months ago

|

No virus

776 Bytes

	---
	license: mit
	language:
	- nl
	tags:
	- gguf
	---

	This repository contains quantized versions of [BramVanroy/fietje-2-instruct](https://huggingface.co/BramVanroy/fietje-2-instruct).


	Available quantization types and expected performance differences compared to base `f16`, higher perplexity=worse (from llama.cpp):

	```
	Q3_K_M : 3.07G, +0.2496 ppl @ LLaMA-v1-7B
	Q4_K_M : 3.80G, +0.0532 ppl @ LLaMA-v1-7B
	Q5_K_M : 4.45G, +0.0122 ppl @ LLaMA-v1-7B
	Q6_K : 5.15G, +0.0008 ppl @ LLaMA-v1-7B
	Q8_0 : 6.70G, +0.0004 ppl @ LLaMA-v1-7B
	F16 : 13.00G @ 7B
	```

	Also available on [ollama](https://ollama.com/bramvanroy/fietje-2b-instruct).

	Quants were made with release [`b2777`](https://github.com/ggerganov/llama.cpp/releases/tag/b2777) of llama.cpp.
	```