ibm-granite
/

granite-3.0-3b-a800m-base

Text Generation

Model card Files Files and versions Community

amezasor commited on 8 days ago

Commit

0d1d12f

•

1 Parent(s): 4ac109e

data update

Files changed (1) hide show

README.md +4 -3

README.md CHANGED Viewed

@@ -209,7 +209,7 @@ model-index:
 # Granite-3.0-3B-A800M-Base
 ## Model Summary
-**Granite-3.0-3B-A800M-Base** is an open-source decoder-only language model from IBM Research that supports a variety of text-to-text generation tasks (e.g., question-answering, text-completion). **Granite-3.0-3B-A800M-Base** is trained from scratch and follows a two-phase training strategy. In the first phase, it is trained on 8 trillion tokens sourced from diverse domains, including natural language, math, code, and safety. During the second phase, it is further trained on 2 trillion tokens using a carefully curated mix of high-quality data, aiming to enhance its performance on specific tasks.
 - **Developers:** IBM Research
@@ -281,9 +281,10 @@ print(output)
 <!-- TO DO: To be completed once the paper is ready -->
 ## Training Data
-This model is trained on a mix of open-source and proprietary datasets.
-<!-- CHECK: removed Vela, only talk about blue-vela-->
 ## Infrastructure
 We train the Granite Language models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs.

 # Granite-3.0-3B-A800M-Base
 ## Model Summary
+**Granite-3.0-3B-A800M-Base** is an open-source decoder-only language model from IBM Research that supports a variety of text-to-text generation tasks (e.g., question-answering, text-completion). **Granite-3.0-3B-A800M-Base** is trained from scratch and follows a two-phase training strategy. In the first phase, it is trained on 8 trillion tokens sourced from diverse domains. During the second phase, it is further trained on 2 trillion tokens using a carefully curated mix of high-quality data, aiming to enhance its performance on specific tasks.
 - **Developers:** IBM Research
 <!-- TO DO: To be completed once the paper is ready -->
 ## Training Data
+This model is trained on a mix of open-source and proprietary data following a two-phase training strategy.
+* Phase 1 data: The data for phase 1 is sourced from diverse domains, such as: web, code, academic sources, books, and math data.
+* Phase 2 data: The data for phase 2 comprises a curated mix of high-quality data from the same domains, plus multilingual and instruction data. The goal of this second training phase is to enhance the model’s performance on specific tasks.
 ## Infrastructure
 We train the Granite Language models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs.