ariel-ml commited on
Commit
65d9584
1 Parent(s): 57064fb

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +77 -0
README.md ADDED
@@ -0,0 +1,77 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: llama2
3
+ language:
4
+ - hu
5
+ - en
6
+ tags:
7
+ - puli
8
+ - llama
9
+ - finetuned
10
+ base_model: ariel-ml/PULI-LlumiX-32K-instruct-f16-0.2
11
+ pipeline_tag: text-generation
12
+ ---
13
+
14
+ # PULI LlumiX 32K instruct (6.74B billion parameter)
15
+
16
+ <img src="logo.webp" width="340" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
17
+
18
+ Intruct finetuned version of NYTK/PULI-LlumiX-32K.
19
+
20
+ ## Provided files
21
+ | Quant method | Bits | Use case |
22
+ | ---- | ---- | ---- |
23
+ | Q3_K_M | 3 | very small, high quality loss |
24
+ | Q4_K_S | 4 | small, greater quality loss |
25
+ | Q4_K_M | 4 | medium, balanced quality - recommended |
26
+ | Q5_K_S | 5 | large, low quality loss - recommended |
27
+ | Q5_K_M | 5 | large, very low quality loss - recommended |
28
+ | Q6_K | 6 | very large, extremely low quality loss |
29
+ | Q8_0 | 8 | very large, extremely low quality loss - not recommended |
30
+
31
+ ## Training platform
32
+ [Runpod](https://runpod.ui) RTX 4090 GPU
33
+
34
+ ## Hyper parameters
35
+
36
+ - Epoch: 3
37
+ - LoRA rank (r): 16
38
+ - LoRA alpha: 16
39
+ - Lr: 2e-4
40
+ - Lr scheduler: cosine
41
+ - Optimizer: adamw_8bit
42
+ - Weight decay: 0.01
43
+
44
+ ## Dataset
45
+
46
+ boapps/szurkemarha
47
+
48
+ Only Hungarian instructions were selected: ~53000 prompts.
49
+
50
+ ## Prompt format: ChatML
51
+
52
+ ```
53
+ <|im_start|>system
54
+ Egy segítőkész mesterséges intelligencia asszisztens vagy. Válaszold meg a kérdést legjobb tudásod szerint!<|im_end|>
55
+ <|im_start|>user
56
+ Ki a legerősebb szuperhős?<|im_end|>
57
+ <|im_start|>assistant
58
+ A legerősebb szuperhős a Marvel univerzumában Hulk.<|im_end|>
59
+ ```
60
+
61
+ ## Base model
62
+
63
+ - Trained with OpenChatKit [github](https://github.com/togethercomputer/OpenChatKit)
64
+ - The [LLaMA-2-7B-32K](https://huggingface.co/togethercomputer/LLaMA-2-7B-32K) model were continuously pretrained on Hungarian dataset
65
+ - The model has been extended to a context length of 32K with position interpolation
66
+ - Checkpoint: 100 000 steps
67
+
68
+ ## Base model dataset for continued pretraining
69
+
70
+ - Hungarian: 7.9 billion words, documents (763K) that exceed 5000 words in length
71
+ - English: Long Context QA (2 billion words), BookSum (78 million words)
72
+
73
+ ## Limitations
74
+
75
+ - max_seq_length = 32 768
76
+ - float16
77
+ - vocab size: 32 000