Edit model card

RWKV-6-World-3B-v2.1-GGUF

This repo contains the RWKV-6-World-3B-v2.1-GGUF quantized with the latest llama.cpp(b3651).

How to run the model

  • Get the latest llama.cpp:
git clone https://github.com/ggerganov/llama.cpp
  • Download the GGUF file to a new model folder in llama.cpp(choose your quant):
cd llama.cpp
mkdir model
git clone https://huggingface.co/Lyte/RWKV-6-World-3B-v2.1-GGUF
mv RWKV-6-World-3B-v2.1-GGUF/RWKV-6-World-3B-v2.1-GGUF-Q4_K_M.gguf model/
rm -r RWKV-6-World-3B-v2.1-GGUF
  • For Windows other than git cloning the repo, you just create the "model" folder inside llama.cpp folder and in here click "Files and versions" and download the model quant you want there.

  • Now to run the model, you can use the following command:

./llama-cli -m ./model/RWKV-6-World-3B-v2.1-GGUF-Q4_K_M.gguf --in-suffix "Assistant:" --interactive-first -c 1024 -t 0.7 --top-k 50 --top-p 0.95 -n 128 -p "Assistant: Hello, what can i help you with today?\nUser:" -r "User:"
Downloads last month
58
GGUF
Model size
3.1B params
Architecture
rwkv6

4-bit

8-bit

Inference API
Inference API (serverless) does not yet support gguf models for this pipeline type.

Model tree for Lyte/RWKV-6-World-3B-v2.1-GGUF

Quantized
this model