Mxode commited on
Commit
d1f3901
1 Parent(s): 7338464

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -1
README.md CHANGED
@@ -33,7 +33,18 @@ This is NanoLM-0.3B-Instruct-v1.1. The model currently supports both **Chinese a
33
 
34
  ## Model Details
35
 
36
- The tokenizer and model architecture of NanoLM-0.3B-Instruct-v1.1 are the same as [Qwen/Qwen2-0.5B](https://huggingface.co/Qwen/Qwen2-0.5B), but the number of layers has been reduced from 24 to 12. As a result, NanoLM-0.3B-Instruct-v1.1 has only 0.3 billion parameters, with approximately **180 million non-embedding parameters**. Despite this, NanoLM-0.3B-Instruct-v1.1 still demonstrates strong instruction-following capabilities.
 
 
 
 
 
 
 
 
 
 
 
37
 
38
  Here are some examples. For reproducibility purposes, I've set `do_sample` to `False`. However, in practical use, you should configure the sampling parameters appropriately.
39
 
 
33
 
34
  ## Model Details
35
 
36
+ | Nano LMs | Non-emb Params | Arch | Layers | Dim | Heads | Seq Len |
37
+ | :----------: | :------------------: | :---: | :----: | :-------: | :---: | :---: |
38
+ | 25M | 15M | MistralForCausalLM | 12 | 312 | 12 |2K|
39
+ | 70M | 42M | LlamaForCausalLM | 12 | 576 | 9 |2K|
40
+ | 0.3B | 180M | Qwen2ForCausalLM | 12 | 896 | 14 |4K|
41
+ | 1B | 840M | Qwen2ForCausalLM | 18 | 1536 | 12 |4K|
42
+
43
+ The tokenizer and model architecture of NanoLM-0.3B-Instruct-v1.1 are the same as [Qwen/Qwen2-0.5B](https://huggingface.co/Qwen/Qwen2-0.5B), but the number of layers has been reduced from 24 to 12.
44
+
45
+ As a result, NanoLM-0.3B-Instruct-v1.1 has only 0.3 billion parameters, with approximately **180 million non-embedding parameters**.
46
+
47
+ Despite this, NanoLM-0.3B-Instruct-v1.1 still demonstrates strong instruction-following capabilities.
48
 
49
  Here are some examples. For reproducibility purposes, I've set `do_sample` to `False`. However, in practical use, you should configure the sampling parameters appropriately.
50