grok-1 / README.md
Citaman's picture
Upload 2 files
1bc9149 verified
|
raw
history blame
3.63 kB
---
license: apache-2.0
---
# Grok-1
---
_This repository contains the weights of the Grok-1 open-weights model._
**To get started with using the model, follow the instructions at** `github.com/xai-org/grok.`
![The cover image was generated using Midjourney based on the following prompt proposed by Grok: A 3D illustration of a neural network, with transparent nodes and glowing connections, showcasing the varying weights as different thicknesses and colors of the connecting lines.](./model_logo.png)
<small>The cover image was generated using [Midjourney](midjourney.com) based on the following prompt proposed by Grok: A 3D illustration of a neural network, with transparent nodes and glowing connections, showcasing the varying weights as different thicknesses and colors of the connecting lines.</small>
---
╔══════════════════════════╗
β•‘ _______ β•‘
β•‘ /\ |_ _| β•‘
β•‘ __ __ / \ | | β•‘
β•‘ \ \/ / / /\ \ | | β•‘
β•‘ > < / ____ \ _| |_ β•‘
β•‘ /_/\_\/_/ \_\_____| β•‘
β•‘ β•‘
β•‘ Understand the Universe β•‘
β•‘ [https://x.ai] β•‘
β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•—β•”β•β•β•β•β•β•β•β•β•β•β•β•β•
β•”β•β•β•β•β•β•β•β•β•β•šβ•β•β•β•β•β•β•β•β•β•—
β•‘ xAI Grok-1 (314B) β•‘
β•šβ•β•β•β•β•β•β•β•β•—β•”β•β•β•β•β•β•β•β•β•β•
β•”β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•—
β•‘ 314B parameter Mixture of Experts model β•‘
β•‘ - Base model (not finetuned) β•‘
β•‘ - 8 experts (2 active) β•‘
β•‘ - 86B active parameters β•‘
β•‘ - Apache 2.0 license β•‘
β•‘ - Code: https://github.com/xai-org/grok-1 β•‘
β•‘ - Happy coding! β•‘
β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•
## Model Configuration Details
**Vocabulary Size**: 131,072
**Special Tokens**:
- Pad Token: 0
- End of Sequence Token: 2
**Sequence Length**: 8192
### **Model Architecture**: MoE
- **Embedding Size**: 6,144
- Rotary Embedding (RoPE)
- **Layers**: 64
- **Experts**: 8
- **Selected Experts**: 2
- **Widening Factor**: 8
- **Key Size**: 128
- **Query Heads**: 48
- **Key Value Heads**: 8
- **Activation Sharding**: Data-wise, Model-wise
- **Tokenizer** : SentencePiece tokenizer
### **Inference Configuration**:
- Batch Size per Device: 0.125
- Tokenizer: `./tokenizer.model`
- Local Mesh: 1x8
- Between Hosts: 1x1
## Inference Details
Make sure to download the `int8` checkpoint to the `checkpoints` directory and run
```shell
pip install -r requirements.txt
python transformer.py
```
to test the code.
You should be seeing output from the language model.
Due to the large size of the model (314B parameters), a multi-GPU machine is required to test the model with the example code.
**p.s. we're hiring: https://x.ai/careers**