amezasor commited on
Commit
7a178b2
•
1 Parent(s): 49d98f4

instruct model card - initial commit

Browse files
Files changed (1) hide show
  1. README.md +319 -3
README.md CHANGED
@@ -1,3 +1,319 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: text-generation
3
+ inference: false
4
+ license: apache-2.0
5
+ # datasets:
6
+ # metrics:
7
+ # - code_eval
8
+ library_name: transformers
9
+ tags:
10
+ - language
11
+ - granite-3.0
12
+ model-index:
13
+ - name: granite-3.0-8b-instruct
14
+ results:
15
+ - task:
16
+ type: text-generation
17
+ dataset:
18
+ type: human-exams
19
+ name: MMLU
20
+ metrics:
21
+ - name: pass@1
22
+ type: pass@1
23
+ value:
24
+ veriefied: false
25
+ - task:
26
+ type: text-generation
27
+ dataset:
28
+ type: human-exams
29
+ name: MMLU-Pro
30
+ metrics:
31
+ - name: pass@1
32
+ type: pass@1
33
+ value:
34
+ veriefied: false
35
+ - task:
36
+ type: text-generation
37
+ dataset:
38
+ type: human-exams
39
+ name: AGI-Eval
40
+ metrics:
41
+ - name: pass@1
42
+ type: pass@1
43
+ value:
44
+ veriefied: false
45
+ - task:
46
+ type: text-generation
47
+ dataset:
48
+ type: commonsense
49
+ name: WinoGrande
50
+ metrics:
51
+ - name: pass@1
52
+ type: pass@1
53
+ value:
54
+ veriefied: false
55
+ - task:
56
+ type: text-generation
57
+ dataset:
58
+ type: commonsense
59
+ name: OBQA
60
+ metrics:
61
+ - name: pass@1
62
+ type: pass@1
63
+ value:
64
+ veriefied: false
65
+ - task:
66
+ type: text-generation
67
+ dataset:
68
+ type: commonsense
69
+ name: SIQA
70
+ metrics:
71
+ - name: pass@1
72
+ type: pass@1
73
+ value:
74
+ veriefied: false
75
+ - task:
76
+ type: text-generation
77
+ dataset:
78
+ type: commonsense
79
+ name: PIQA
80
+ metrics:
81
+ - name: pass@1
82
+ type: pass@1
83
+ value:
84
+ veriefied: false
85
+ - task:
86
+ type: text-generation
87
+ dataset:
88
+ type: commonsense
89
+ name: Hellaswag
90
+ metrics:
91
+ - name: pass@1
92
+ type: pass@1
93
+ value:
94
+ veriefied: false
95
+ - task:
96
+ type: text-generation
97
+ dataset:
98
+ type: commonsense
99
+ name: TruthfulQA
100
+ metrics:
101
+ - name: pass@1
102
+ type: pass@1
103
+ value:
104
+ veriefied: false
105
+ - task:
106
+ type: text-generation
107
+ dataset:
108
+ type: reading-comprehension
109
+ name: BoolQ
110
+ metrics:
111
+ - name: pass@1
112
+ type: pass@1
113
+ value:
114
+ veriefied: false
115
+ - task:
116
+ type: text-generation
117
+ dataset:
118
+ type: reading-comprehension
119
+ name: SQuAD v2
120
+ metrics:
121
+ - name: pass@1
122
+ type: pass@1
123
+ value:
124
+ veriefied: false
125
+ - task:
126
+ type: text-generation
127
+ dataset:
128
+ type: reasoning
129
+ name: ARC-C
130
+ metrics:
131
+ - name: pass@1
132
+ type: pass@1
133
+ value:
134
+ veriefied: false
135
+ - task:
136
+ type: text-generation
137
+ dataset:
138
+ type: reasoning
139
+ name: GPQA
140
+ metrics:
141
+ - name: pass@1
142
+ type: pass@1
143
+ value:
144
+ veriefied: false
145
+ - task:
146
+ type: text-generation
147
+ dataset:
148
+ type: reasoning
149
+ name: BBH
150
+ metrics:
151
+ - name: pass@1
152
+ type: pass@1
153
+ value:
154
+ veriefied: false
155
+ - task:
156
+ type: text-generation
157
+ dataset:
158
+ type: code
159
+ name: HumanEval
160
+ metrics:
161
+ - name: pass@1
162
+ type: pass@1
163
+ value:
164
+ veriefied: false
165
+ - task:
166
+ type: text-generation
167
+ dataset:
168
+ type: code
169
+ name: MBPP
170
+ metrics:
171
+ - name: pass@1
172
+ type: pass@1
173
+ value:
174
+ veriefied: false
175
+ - task:
176
+ type: text-generation
177
+ dataset:
178
+ type: math
179
+ name: GSM8K
180
+ metrics:
181
+ - name: pass@1
182
+ type: pass@1
183
+ value:
184
+ veriefied: false
185
+ - task:
186
+ type: text-generation
187
+ dataset:
188
+ type: math
189
+ name: MATH
190
+ metrics:
191
+ - name: pass@1
192
+ type: pass@1
193
+ value:
194
+ veriefied: false
195
+ - task:
196
+ type: text-generation
197
+ dataset:
198
+ type: multilingual
199
+ name: MGSM
200
+ metrics:
201
+ - name: pass@1
202
+ type: pass@1
203
+ value:
204
+ veriefied: false
205
+ ---
206
+
207
+ <!-- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62cd5057674cdb524450093d/1hzxoPwqkBJXshKVVe6_9.png) -->
208
+
209
+ # Granite-3.0-8B-Instruct
210
+
211
+ ## Model Summary
212
+ **Granite-3.0-8B-Instruct** is a lightweight and open-source 8B parameter model fine tuned from *Granite-3.0-8B-Base* on a combination of open-source and proprietary instruction data with a **permissively licensed**. This language model is designed to excel in instruction following tasks such as summarization, problem-solving, text translation, reasoning, code tasks, funcion-calling, and more.
213
+ <!-- The lightweight and open-source nature of this model makes it an excellent choice to serve as backbone of real-time applications such as chatbots and conversational agents. -->
214
+
215
+ - **Developers:** IBM Research
216
+ - **GitHub Repository:** [ibm-granite/granite-language-models](https://github.com/ibm-granite/granite-language-models)
217
+ - **Website**: [Granite Docs](https://www.ibm.com/granite/docs/)
218
+ - **Paper:** [Granite Language Models](https://) <!-- TO DO: Update github repo link when it is ready -->
219
+ - **Release Date**: October 21st, 2024
220
+ - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0).
221
+
222
+ ## Supported Languages
223
+ English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, Chinese (Simplified)
224
+
225
+ ## Usage
226
+ ### Intended use
227
+ The model is designed to respond to general instructions and can be used to build AI assistants for multiple domains, including bussiness applications.
228
+
229
+ ### Capabilities
230
+ * Summarization
231
+ * Text classification
232
+ * Text extraction
233
+ * Question-answering
234
+ * Retrieval Augmented Generation (RAG)
235
+ * Code related
236
+ * Function-calling
237
+ * Multilingual dialog use cases
238
+
239
+ ### Generation
240
+ This is a simple example of how to use **Granite-3.0-8B-Instruct** model.
241
+
242
+ Install the following libraries:
243
+
244
+ ```shell
245
+ pip install torch torchvision torchaudio
246
+ pip install accelerate
247
+ pip install transformers
248
+ ```
249
+ Then, copy the snippet from the section that is relevant for your usecase.
250
+
251
+ ```python
252
+ import torch
253
+ from transformers import AutoModelForCausalLM, AutoTokenizer
254
+
255
+ device = "auto"
256
+ model_path = "ibm-granite/granite-3.0-8b-instruct"
257
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
258
+ # drop device_map if running on CPU
259
+ model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
260
+ model.eval()
261
+ # change input text as desired
262
+ chat = [
263
+ { "role": "user", "content": "Please list one IBM Research laboratory located in the United States. You should only output its name and location." },
264
+ ]
265
+ chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
266
+ # tokenize the text
267
+ input_tokens = tokenizer(chat, return_tensors="pt").to(device)
268
+ # generate output tokens
269
+ output = model.generate(**input_tokens,
270
+ max_new_tokens=100)
271
+ # decode output tokens into text
272
+ output = tokenizer.batch_decode(output)
273
+ # print output
274
+ print(output)
275
+ ```
276
+
277
+ <!-- TO DO: function-calling-example
278
+ -->
279
+
280
+ <!-- ['<|start_of_role|>user<|end_of_role|>Please list one IBM Research laboratory located in the United States. You should only output its name and location.<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>1. IBM Research - Almaden, San Jose, California<|end_of_text|>'] -->
281
+
282
+ ## Model Architeture
283
+ **Granite-3.0-8B-Instruct** is based on a decoder-only dense transformer architecture. Core components of this architecture are: GQA and RoPE, MLP with SwiGLU, RMSNorm, and shared input/output embbeddings.
284
+
285
+ | Model | 2B Dense | 8B Dense | 1B MoE | 3B MoE |
286
+ | :-------- | :--------| :-------- | :------| :------|
287
+ | Embedding size | 2048 | **4096** | 1024 | 1536 |
288
+ | Number of layers | 40 | **40** | 24 | 32 |
289
+ | Attention head size | 64 | **128** | 64 | 64 |
290
+ | Number of attention heads | 32 | **32** | 16 | 24 |
291
+ | Number of KV heads | 8 | **8** | 8 | 8 |
292
+ | MLP hidden size | 8192 | **12800** | 512 | 512 |
293
+ | MLP activation | SwiGLU | **SwiGLU** | SwiGLU | SwiGLU |
294
+ | Number of Experts | — | **—** | 32 | 40 |
295
+ | MoE TopK | — | **—** | 8 | 8 |
296
+ | Initialization std | 0.1 | **0.1** | 0.1 | 0.1 |
297
+ | Sequence Length | 4096 | **4096** | 4096 | 4096 |
298
+ | Position Embedding | RoPE | **RoPE** | RoPE | RoPE |
299
+ | # Paremeters | 2.5B | **8.1B** | 1.3B | 3.3B |
300
+ | # Active Parameters | 2.5B | **8.1B** | 400M | 800M |
301
+ | # Training tokens | 12T | **12T** | 10T | 10T |
302
+
303
+ <!-- TO DO: To be completed once the paper is ready, we may changed title to Supervised Finetuning -->
304
+ ## Training Data
305
+ This model is trained on a mix of open-source and proprietary datasets.
306
+ <!-- ### Instruction Datasets
307
+ * Language Instruction Datasets: We include high-quality datasets such as [TO DO: List of datasets]
308
+ * Synthetic Instruction Datasets: [TO DO: paragraph about synthetic data]
309
+ ### Processing
310
+ * [TO DO: Data annotation with MagPie pipeline: quality, duplicates] -->
311
+
312
+ <!-- CHECK: removed Vela, only talk about blue-vela-->
313
+ ## Infrastructure
314
+ We train the Granite Language models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs.
315
+
316
+ <!-- TO DO: Check multilingual statement once the paper is ready -->
317
+ ## Ethical Considerations and Limitations
318
+ Granite instruct models are primarily finetuned using instruction-response pairs mostly in English, but also in German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese (Simplified). As this model has been exposed to multilingual data, it can handle multilingual dialog use cases with a limited performance in non-English tasks. In such case, introducing a small number of examples (few-shot) can help the model in generating more accurate outputs. The model also inherits ethical considerations and limitations from its base model. For more information, please refer to *[Granite-3.0-8B-Base](https://huggingface.co/ibm-granite/granite-3.0-8b-base)* model card.
319
+