--- base_model: HuggingFaceH4/zephyr-7b-beta inference: false license: mit model_creator: HuggingFaceH4 model_name: Zephyr 7B Beta model_type: mistral prompt_template: "<|system|> <|user|> {prompt} <|assistant|> " quantized_by: Semantically AI pruned_by: Semantically AI --- # Zephyr 7B Beta - GGUF - Model creator: [Hugging Face H4](https://huggingface.co/HuggingFaceH4) - Original model: [Zephyr 7B Beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) ## Description This repo contains GGUF format model files for [Hugging Face H4's Zephyr 7B Beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta). GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. ## Prompt template: Zephyr ``` <|system|> <|user|> {prompt} <|assistant|> ``` ## Explanation of quantisation methods
Click to see details The GGUF model is pruned to 50% using sparseGPT method [sparseGPT](https://github.com/IST-DASLab/sparsegpt)
``` from llama_cpp import Llama llm = Llama(model_path="zephyr-7b-beta-pruned50-Q8_0.gguf") output = llm(""" <|system|> You are a friendly chatbot who always responds in the style of a pirate. <|user|> How many helicopters can a human eat in one sitting? <|assistant|>""") print(output) # ```