feihu.hf commited on
Commit
aaab942
1 Parent(s): 515fcfd

update README & LICENSE

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -9,6 +9,7 @@ base_model: Qwen/Qwen2.5-72B-Instruct
9
  tags:
10
  - chat
11
  ---
 
12
  # Qwen2.5-72B-Instruct-GGUF
13
 
14
  ## Introduction
@@ -29,6 +30,7 @@ Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we rele
29
  - Number of Layers: 80
30
  - Number of Attention Heads (GQA): 64 for Q and 8 for KV
31
  - Context Length: Full 32,768 tokens and generation 8192 tokens
 
32
  - Quantization: q2_K, q3_K_M, q4_0, q4_K_M, q5_0, q5_K_M, q6_K, q8_0
33
 
34
  For more details, please refer to our [blog](https://qwenlm.github.io/blog/qwen2.5/), [GitHub](https://github.com/QwenLM/Qwen2.5), and [Documentation](https://qwen.readthedocs.io/en/latest/).
 
9
  tags:
10
  - chat
11
  ---
12
+
13
  # Qwen2.5-72B-Instruct-GGUF
14
 
15
  ## Introduction
 
30
  - Number of Layers: 80
31
  - Number of Attention Heads (GQA): 64 for Q and 8 for KV
32
  - Context Length: Full 32,768 tokens and generation 8192 tokens
33
+ - Note: Currently, only vLLM supports YARN for length extrapolating. If you want to process sequences up to 131,072 tokens, please refer to non-GGUF models.
34
  - Quantization: q2_K, q3_K_M, q4_0, q4_K_M, q5_0, q5_K_M, q6_K, q8_0
35
 
36
  For more details, please refer to our [blog](https://qwenlm.github.io/blog/qwen2.5/), [GitHub](https://github.com/QwenLM/Qwen2.5), and [Documentation](https://qwen.readthedocs.io/en/latest/).