Update README.md

f70a7e1 verified 3 days ago

No virus

4.38 kB

	---
	base_model: Qwen/Qwen2.5-Coder-7B-Instruct
	language:
	- en
	library_name: transformers
	license: apache-2.0
	license_link: https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct/blob/main/LICENSE
	pipeline_tag: text-generation
	tags:
	- code
	- codeqwen
	- chat
	- qwen
	- qwen-coder
	- llama-cpp
	- gguf-my-repo
	---

	# smcleod/Qwen2.5-Coder-7B-Instruct-Q8_0-GGUF
	This model was converted to GGUF format from [`Qwen/Qwen2.5-Coder-7B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
	Refer to the [original model card](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct) for more details on the model.


	## Ollama Modelfile (draft/beta!)

	```

	# ollama create qwen2.5-coder-7b-instruct:q8_0 -f modelfiles/Modelfile-qwen2.5-coder

	FROM ../qwen2.5-coder-7b-instruct-q8_0.gguf

	# This is Sam's hacked up template 2024-09-19
	TEMPLATE """
	{{- $fim_prefix := .FIMPrefix -}}
	{{- $fim_suffix := .FIMSuffix -}}
	{{- $repo_name := .RepoName -}}
	{{- $files := .Files -}}
	{{- $has_tools := gt (len .Tools) 0 -}}
	{{- if $fim_prefix -}}
	<\|fim_prefix\|>{{ $fim_prefix }}<\|fim_suffix\|>{{ $fim_suffix }}<\|fim_middle\|>
	{{- else if $repo_name -}}
	<\|repo_name\|>{{ $repo_name }}
	{{- range $files }}
	<\|file_sep\|>{{ .Path }}
	{{ .Content }}
	{{- end }}
	{{- else -}}
	{{- if or .System $has_tools -}}
	<\|im_start\|>system
	{{- if .System }}
	{{ .System }}
	{{- end }}
	{{- if $has_tools }}

	# Tools

	You may call one or more functions to assist with the user query.

	You are provided with function signatures within <tools></tools> XML tags:
	<tools>
	{{- range .Tools }}
	{"type": "function", "function": {{ .Function }}}
	{{- end }}
	</tools>

	For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
	<tool_call>
	{"name": <function-name>, "arguments": <args-json-object>}
	</tool_call>
	{{- end }}
	<\|im_end\|>
	{{- end }}
	{{- if .Messages }}
	{{- range $i, $message := .Messages }}
	{{- if eq .Role "user" }}<\|im_start\|>user
	{{ .Content }}<\|im_end\|>
	{{- else if eq .Role "assistant" }}<\|im_start\|>assistant
	{{- if .Content }}{{ .Content }}
	{{- else if .ToolCalls }}<tool_call>
	{{- range .ToolCalls }}
	{"name": "{{ .Function.Name }}", "arguments": {{ .Function.Arguments }}}
	{{- end }}
	</tool_call>
	{{- end }}<\|im_end\|>
	{{- else if eq .Role "tool" }}<\|im_start\|>user
	<tool_response>
	{{ .Content }}
	</tool_response><\|im_end\|>
	{{- end }}
	{{- end }}
	{{- else if .Prompt -}}
	<\|im_start\|>user
	{{ .Prompt }}<\|im_end\|>
	{{- end -}}
	<\|im_start\|>assistant
	{{ .Response }}
	{{- end -}}
	"""

	PARAMETER stop "<\|im_start\|>"
	PARAMETER stop "<\|im_end\|>"
	PARAMETER stop "<\|fim_prefix\|>"
	PARAMETER stop "<\|fim_suffix\|>"
	PARAMETER stop "<\|fim_middle\|>"
	PARAMETER stop "<\|repo_name\|>"
	PARAMETER stop "<\|file_sep\|>"

	### Tuning ##
	PARAMETER num_ctx 16384
	PARAMETER temperature 0.3
	PARAMETER top_p 0.8

	# PARAMETER num_batch 1024
	# PARAMETER num_keep 512
	# PARAMETER presence_penalty 0.2
	# PARAMETER frequency_penalty 0.2
	# PARAMETER repeat_last_n 50
	```

	## Use with llama.cpp
	Install llama.cpp through brew (works on Mac and Linux)

	```bash
	brew install llama.cpp

	```
	Invoke the llama.cpp server or the CLI.

	### CLI:
	```bash
	llama-cli --hf-repo smcleod/Qwen2.5-Coder-7B-Instruct-Q8_0-GGUF --hf-file qwen2.5-coder-7b-instruct-q8_0.gguf -p "The meaning to life and the universe is"
	```

	### Server:
	```bash
	llama-server --hf-repo smcleod/Qwen2.5-Coder-7B-Instruct-Q8_0-GGUF --hf-file qwen2.5-coder-7b-instruct-q8_0.gguf -c 2048
	```

	Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well.

	Step 1: Clone llama.cpp from GitHub.
	```
	git clone https://github.com/ggerganov/llama.cpp
	```

	Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).
	```
	cd llama.cpp && LLAMA_CURL=1 make
	```

	Step 3: Run inference through the main binary.
	```
	./llama-cli --hf-repo smcleod/Qwen2.5-Coder-7B-Instruct-Q8_0-GGUF --hf-file qwen2.5-coder-7b-instruct-q8_0.gguf -p "The meaning to life and the universe is"
	```
	or
	```
	./llama-server --hf-repo smcleod/Qwen2.5-Coder-7B-Instruct-Q8_0-GGUF --hf-file qwen2.5-coder-7b-instruct-q8_0.gguf -c 2048
	```