legolasyiu commited on
Commit
a2905ab
1 Parent(s): 20b20bc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +31 -0
README.md CHANGED
@@ -11,6 +11,37 @@ tags:
11
  - trl
12
  ---
13
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  # Uploaded model
15
 
16
  - **Developed by:** EpistemeAI
 
11
  - trl
12
  ---
13
 
14
+ # Fireball-Mistral-Nemo-12B-Philos
15
+ Supervised Fined tuned by dataset of philosophy, math, coding and languages.
16
+
17
+ # Original Model Card
18
+
19
+ # Model Card for Mistral-Nemo-Instruct-2407
20
+
21
+ The Mistral-Nemo-Instruct-2407 Large Language Model (LLM) is an instruct fine-tuned version of the [Mistral-Nemo-Base-2407](https://huggingface.co/mistralai/Mistral-Nemo-Base-2407). Trained jointly by Mistral AI and NVIDIA, it significantly outperforms existing models smaller or similar in size.
22
+
23
+ For more details about this model please refer to our release [blog post](https://mistral.ai/news/mistral-nemo/).
24
+
25
+ ## Key features
26
+ - Released under the **Apache 2 License**
27
+ - Pre-trained and instructed versions
28
+ - Trained with a **128k context window**
29
+ - Trained on a large proportion of **multilingual and code data**
30
+ - Drop-in replacement of Mistral 7B
31
+
32
+ ## Model Architecture
33
+ Mistral Nemo is a transformer model, with the following architecture choices:
34
+ - **Layers:** 40
35
+ - **Dim:** 5,120
36
+ - **Head dim:** 128
37
+ - **Hidden dim:** 14,336
38
+ - **Activation Function:** SwiGLU
39
+ - **Number of heads:** 32
40
+ - **Number of kv-heads:** 8 (GQA)
41
+ - **Vocabulary size:** 2**17 ~= 128k
42
+ - **Rotary embeddings (theta = 1M)**
43
+
44
+
45
  # Uploaded model
46
 
47
  - **Developed by:** EpistemeAI