amezasor commited on
Commit
6e599c1
1 Parent(s): 330dca1

updates after review

Browse files
Files changed (1) hide show
  1. README.md +20 -21
README.md CHANGED
@@ -201,29 +201,29 @@ model-index:
201
  veriefied: false
202
  ---
203
 
 
204
  <!-- ![image/png](granite-3_0-language-models_Group_1.png) -->
205
 
206
  # Granite-3.0-3B-A800M-Base
207
 
208
- ## Model Summary
209
- **Granite-3.0-3B-A800M-Base** is an open-source decoder-only language model from IBM Research that supports a variety of text-to-text generation tasks (e.g., question-answering, text-completion). **Granite-3.0-3B-A800M-Base** is trained from scratch and follows a two-phase training strategy. In the first phase, it is trained on 8 trillion tokens sourced from diverse domains. During the second phase, it is further trained on 2 trillion tokens using a carefully curated mix of high-quality data, aiming to enhance its performance on specific tasks.
210
 
211
  - **Developers:** IBM Research
212
  - **GitHub Repository:** [ibm-granite/granite-3.0-language-models](https://github.com/ibm-granite/granite-3.0-language-models)
213
  - **Website**: [Granite Docs](https://www.ibm.com/granite/docs/)
214
- - **Paper:** [Granite 3.0 Language Models]()
215
  - **Release Date**: October 21st, 2024
216
  - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
217
 
218
- ## Supported Languages
219
- English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, Chinese
220
 
221
- ## Usage
222
- ### Intended use
223
  Prominent use cases of LLMs in text-to-text generation include summarization, text classification, extraction, question-answering, and more. All Granite Base models are able to handle these tasks as they were trained on a large amount of data from various domains. Moreover, they can serve as baseline to create specialized models for specific application scenarios.
224
 
225
- ### Generation
226
- This is a simple example of how to use **Granite-3.0-3B-A800M-Base** model.
227
 
228
  Install the following libraries:
229
 
@@ -255,8 +255,8 @@ output = tokenizer.batch_decode(output)
255
  print(output)
256
  ```
257
 
258
- ## Model Architeture
259
- **Granite-3.0-3B-A800M-Base** is based on a decoder-only sparse Mixture of Experts(MoE) transformer architecture. Core components of this architecture are: Fine-grained Experts, Dropless Token Routing, and Load Balancing Loss.
260
 
261
  | Model | 2B Dense | 8B Dense | 1B MoE | 3B MoE |
262
  | :-------- | :--------| :--------| :--------| :-------- |
@@ -276,19 +276,18 @@ print(output)
276
  | # Active Parameters | 2.5B | 8.1B | 400M | **800M** |
277
  | # Training tokens | 12T | 12T | 10T | **10T** |
278
 
279
- <!-- TO DO: To be completed once the paper is ready -->
280
- ## Training Data
281
- This model is trained on a mix of open-source and proprietary data following a two-phase training strategy.
282
- * Phase 1 data: The data for phase 1 is sourced from diverse domains, such as: web, code, academic sources, books, and math data.
283
- * Phase 2 data: The data for phase 2 comprises a curated mix of high-quality data from the same domains, plus multilingual and instruction data. The goal of this second training phase is to enhance the model’s performance on specific tasks.
284
 
285
- ## Infrastructure
286
- We train the Granite Language models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs.
287
 
288
- ## Ethical Considerations and Limitations
289
  The use of Large Language Models involves risks and ethical considerations people must be aware of, including but not limited to: bias and fairness, misinformation, and autonomous decision-making. **Granite-3.0-3B-A800M-Base** model is not the exception in this regard. Even though this model is suited for multiple generative AI tasks, it has not undergone any safety alignment, there it may produce problematic outputs. Additionally, it remains uncertain whether smaller models might exhibit increased susceptibility to hallucination in generation scenarios by copying text verbatim from the training dataset due to their reduced sizes and memorization capacities. This aspect is currently an active area of research, and we anticipate more rigorous exploration, comprehension, and mitigations in this domain. Regarding ethics, a latent risk associated with all Large Language Models is their malicious utilization. We urge the community to use **Granite-3.0-3B-A800M-Base** model with ethical intentions and in a responsible way.
290
 
291
- ## Citation
292
  ```
293
  @misc{granite-models,
294
  author = {author 1, author2, ...},
@@ -298,4 +297,4 @@ The use of Large Language Models involves risks and ethical considerations peopl
298
  year = {2024},
299
  url = {https://arxiv.org/abs/0000.00000},
300
  }
301
- ```
 
201
  veriefied: false
202
  ---
203
 
204
+ <!-- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62cd5057674cdb524450093d/1hzxoPwqkBJXshKVVe6_9.png) -->
205
  <!-- ![image/png](granite-3_0-language-models_Group_1.png) -->
206
 
207
  # Granite-3.0-3B-A800M-Base
208
 
209
+ **Model Summary:**
210
+ Granite-3.0-3B-A800M-Base is is a decoder-only language model to support a variety of text-to-text generation tasks. It is trained from scratch following a two-stage training strategy. In the first stage, it is trained on 8 trillion tokens sourced from diverse domains. During the second stage, it is further trained on 2 trillion tokens using a carefully curated mix of high-quality data, aiming to enhance its performance on specific tasks.
211
 
212
  - **Developers:** IBM Research
213
  - **GitHub Repository:** [ibm-granite/granite-3.0-language-models](https://github.com/ibm-granite/granite-3.0-language-models)
214
  - **Website**: [Granite Docs](https://www.ibm.com/granite/docs/)
215
+ - **Paper:** [Granite 3.0 Language Models](https://github.com/ibm-granite/granite-3.0-language-models/blob/main/granite-3-language-models.pdf)
216
  - **Release Date**: October 21st, 2024
217
  - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
218
 
219
+ **Supported Languages:**
220
+ English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese. Users may fintune Granite 3.0 models for languages beyond these 12 languages.
221
 
222
+ **Intended use:**
 
223
  Prominent use cases of LLMs in text-to-text generation include summarization, text classification, extraction, question-answering, and more. All Granite Base models are able to handle these tasks as they were trained on a large amount of data from various domains. Moreover, they can serve as baseline to create specialized models for specific application scenarios.
224
 
225
+ **Generation:**
226
+ This is a simple example of how to use Granite-3.0-3B-A800M-Base model.
227
 
228
  Install the following libraries:
229
 
 
255
  print(output)
256
  ```
257
 
258
+ **Model Architeture:**
259
+ Granite-3.0-3B-A800M-Base is based on a decoder-only sparse Mixture of Experts(MoE) transformer architecture. Core components of this architecture are: Fine-grained Experts, Dropless Token Routing, and Load Balancing Loss.
260
 
261
  | Model | 2B Dense | 8B Dense | 1B MoE | 3B MoE |
262
  | :-------- | :--------| :--------| :--------| :-------- |
 
276
  | # Active Parameters | 2.5B | 8.1B | 400M | **800M** |
277
  | # Training tokens | 12T | 12T | 10T | **10T** |
278
 
279
+ **Training Data:**
280
+ This model is trained on a mix of open source and proprietary data following a two-phase training strategy.
281
+ * Stage 1 data: The data for phase 1 is sourced from diverse domains, such as: web, code, academic sources, books, and math data.
282
+ * Stage 2 data: The data for phase 2 comprises a curated mix of high-quality data from the same domains, plus multilingual and instruction data. The goal of this second training phase is to enhance the model’s performance on specific tasks.
 
283
 
284
+ **Infrastructure:**
285
+ We train Granite 3.0 Language Models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs.
286
 
287
+ **Ethical Considerations and Limitations:**
288
  The use of Large Language Models involves risks and ethical considerations people must be aware of, including but not limited to: bias and fairness, misinformation, and autonomous decision-making. **Granite-3.0-3B-A800M-Base** model is not the exception in this regard. Even though this model is suited for multiple generative AI tasks, it has not undergone any safety alignment, there it may produce problematic outputs. Additionally, it remains uncertain whether smaller models might exhibit increased susceptibility to hallucination in generation scenarios by copying text verbatim from the training dataset due to their reduced sizes and memorization capacities. This aspect is currently an active area of research, and we anticipate more rigorous exploration, comprehension, and mitigations in this domain. Regarding ethics, a latent risk associated with all Large Language Models is their malicious utilization. We urge the community to use **Granite-3.0-3B-A800M-Base** model with ethical intentions and in a responsible way.
289
 
290
+ <!-- ## Citation
291
  ```
292
  @misc{granite-models,
293
  author = {author 1, author2, ...},
 
297
  year = {2024},
298
  url = {https://arxiv.org/abs/0000.00000},
299
  }
300
+ ``` -->