amezasor commited on
Commit
aba2d80
1 Parent(s): 8db5200

training data word choice fix

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -277,9 +277,9 @@ Granite-3.0-3B-A800M-Base is based on a decoder-only sparse Mixture of Experts(M
277
  | # Training tokens | 12T | 12T | 10T | **10T** |
278
 
279
  **Training Data:**
280
- This model is trained on a mix of open source and proprietary data following a two-phase training strategy.
281
- * Stage 1 data: The data for phase 1 is sourced from diverse domains, such as: web, code, academic sources, books, and math data.
282
- * Stage 2 data: The data for phase 2 comprises a curated mix of high-quality data from the same domains, plus multilingual and instruction data. The goal of this second training phase is to enhance the model’s performance on specific tasks.
283
 
284
  **Infrastructure:**
285
  We train Granite 3.0 Language Models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs.
 
277
  | # Training tokens | 12T | 12T | 10T | **10T** |
278
 
279
  **Training Data:**
280
+ This model is trained on a mix of open source and proprietary data following a two-stage training strategy.
281
+ * Stage 1 data: The data for stage 1 is sourced from diverse domains, such as: web, code, academic sources, books, and math data.
282
+ * Stage 2 data: The data for stage 2 comprises a curated mix of high-quality data from the same domains, plus multilingual and instruction data. The goal of this second training phase is to enhance the model’s performance on specific tasks.
283
 
284
  **Infrastructure:**
285
  We train Granite 3.0 Language Models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs.