--- license: apache-2.0 --- # Grok-1 --- _This repository contains the weights of the Grok-1 open-weights model._ **To get started with using the model, follow the instructions at** `github.com/xai-org/grok.` ![The cover image was generated using Midjourney based on the following prompt proposed by Grok: A 3D illustration of a neural network, with transparent nodes and glowing connections, showcasing the varying weights as different thicknesses and colors of the connecting lines.](./model_logo.png) The cover image was generated using [Midjourney](midjourney.com) based on the following prompt proposed by Grok: A 3D illustration of a neural network, with transparent nodes and glowing connections, showcasing the varying weights as different thicknesses and colors of the connecting lines. --- ╔══════════════════════════╗ ║ _______ ║ ║ /\ |_ _| ║ ║ __ __ / \ | | ║ ║ \ \/ / / /\ \ | | ║ ║ > < / ____ \ _| |_ ║ ║ /_/\_\/_/ \_\_____| ║ ║ ║ ║ Understand the Universe ║ ║ [https://x.ai] ║ ╚════════════╗╔════════════╝ ╔════════╝╚═════════╗ ║ xAI Grok-1 (314B) ║ ╚════════╗╔═════════╝ ╔═════════════════════╝╚═════════════════════╗ ║ 314B parameter Mixture of Experts model ║ ║ - Base model (not finetuned) ║ ║ - 8 experts (2 active) ║ ║ - 86B active parameters ║ ║ - Apache 2.0 license ║ ║ - Code: https://github.com/xai-org/grok-1 ║ ║ - Happy coding! ║ ╚════════════════════════════════════════════╝ ## Model Configuration Details **Vocabulary Size**: 131,072 **Special Tokens**: - Pad Token: 0 - End of Sequence Token: 2 **Sequence Length**: 8192 ### **Model Architecture**: MoE - **Embedding Size**: 6,144 - Rotary Embedding (RoPE) - **Layers**: 64 - **Experts**: 8 - **Selected Experts**: 2 - **Widening Factor**: 8 - **Key Size**: 128 - **Query Heads**: 48 - **Key Value Heads**: 8 - **Activation Sharding**: Data-wise, Model-wise - **Tokenizer** : SentencePiece tokenizer ### **Inference Configuration**: - Batch Size per Device: 0.125 - Tokenizer: `./tokenizer.model` - Local Mesh: 1x8 - Between Hosts: 1x1 ## Inference Details Make sure to download the `int8` checkpoint to the `checkpoints` directory and run ```shell pip install -r requirements.txt python transformer.py ``` to test the code. You should be seeing output from the language model. Due to the large size of the model (314B parameters), a multi-GPU machine is required to test the model with the example code. **p.s. we're hiring: https://x.ai/careers**