Semantically-AI
/

zephyr-7b-beta-pruned50-GGUF

Model card Files Files and versions Community

zephyr-7b-beta-pruned50-GGUF / README.md

Alex Ji

bug fix

80a9568 9 months ago

|

history blame contribute delete

No virus

1.68 kB

	---
	base_model: HuggingFaceH4/zephyr-7b-beta
	inference: false
	license: mit
	model_creator: HuggingFaceH4
	model_name: Zephyr 7B Beta
	model_type: mistral
	prompt_template: "<\|system\|>
	</s>
	<\|user\|>
	{prompt}</s>
	<\|assistant\|>
	"
	quantized_by: Semantically AI
	pruned_by: Semantically AI
	---

	# Zephyr 7B Beta - GGUF

	- Model creator: [Hugging Face H4](https://huggingface.co/HuggingFaceH4)
	- Original model: [Zephyr 7B Beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta)

	<!-- description start -->

	## Description

	This repo contains GGUF format model files for [Hugging Face H4's Zephyr 7B Beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta).

	<!-- description end -->
	<!-- README_GGUF.md-about-gguf start -->

	GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp.

	<!-- README_GGUF.md-about-gguf end -->
	<!-- prompt-template start -->

	## Prompt template: Zephyr

	```
	<\|system\|>
	</s>
	<\|user\|>
	{prompt}</s>
	<\|assistant\|>
	```

	<!-- prompt-template end -->

	<!-- explaination start -->

	## Explanation of quantisation methods

	<details>
	<summary>Click to see details</summary>

	The GGUF model is pruned to 50% using sparseGPT method [sparseGPT](https://github.com/IST-DASLab/sparsegpt)

	</details>
	<!-- explaination end -->

	```
	from llama_cpp import Llama

	llm = Llama(model_path="zephyr-7b-beta-pruned50-Q8_0.gguf")

	output = llm("""
	<\|system\|>
	You are a friendly chatbot who always responds in the style of a pirate.</s>
	<\|user\|>
	How many helicopters can a human eat in one sitting?</s>
	<\|assistant\|>""")

	print(output)
	#
	```

	<!-- README_GGUF.md-how-to-run start -->