grok-1 / README.md
Citaman's picture
Upload 2 files
1bc9149 verified
|
raw
history blame
3.63 kB
metadata
license: apache-2.0

Grok-1


This repository contains the weights of the Grok-1 open-weights model.

To get started with using the model, follow the instructions at github.com/xai-org/grok.

The cover image was generated using Midjourney based on the following prompt proposed by Grok: A 3D illustration of a neural network, with transparent nodes and glowing connections, showcasing the varying weights as different thicknesses and colors of the connecting lines.

The cover image was generated using Midjourney based on the following prompt proposed by Grok: A 3D illustration of a neural network, with transparent nodes and glowing connections, showcasing the varying weights as different thicknesses and colors of the connecting lines.


                     ╔══════════════════════════╗
                     β•‘                 _______  β•‘
                     β•‘            /\   |_   _|  β•‘
                     β•‘  __  __   /  \    | |    β•‘
                     β•‘  \ \/ /  / /\ \   | |    β•‘
                     β•‘   >  <  / ____ \ _| |_   β•‘
                     β•‘  /_/\_\/_/    \_\_____|  β•‘
                     β•‘                          β•‘
                     β•‘  Understand the Universe β•‘
                     β•‘      [https://x.ai]      β•‘
                     β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•—β•”β•β•β•β•β•β•β•β•β•β•β•β•β•
                         β•”β•β•β•β•β•β•β•β•β•β•šβ•β•β•β•β•β•β•β•β•β•—
                         β•‘ xAI Grok-1 (314B) β•‘
                         β•šβ•β•β•β•β•β•β•β•β•—β•”β•β•β•β•β•β•β•β•β•β•
            β•”β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•—
            β•‘ 314B parameter Mixture of Experts model    β•‘
            β•‘ - Base model (not finetuned)               β•‘
            β•‘ - 8 experts (2 active)                     β•‘
            β•‘ - 86B active parameters                    β•‘
            β•‘ - Apache 2.0 license                       β•‘
            β•‘ - Code: https://github.com/xai-org/grok-1  β•‘
            β•‘ - Happy coding!                            β•‘
            β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•

Model Configuration Details

Vocabulary Size: 131,072

Special Tokens:

  • Pad Token: 0
  • End of Sequence Token: 2

Sequence Length: 8192

Model Architecture: MoE

  • Embedding Size: 6,144
    • Rotary Embedding (RoPE)
  • Layers: 64
  • Experts: 8
  • Selected Experts: 2
  • Widening Factor: 8
  • Key Size: 128
  • Query Heads: 48
  • Key Value Heads: 8
  • Activation Sharding: Data-wise, Model-wise
  • Tokenizer : SentencePiece tokenizer

Inference Configuration:

  • Batch Size per Device: 0.125
  • Tokenizer: ./tokenizer.model
  • Local Mesh: 1x8
  • Between Hosts: 1x1

Inference Details

Make sure to download the int8 checkpoint to the checkpoints directory and run

pip install -r requirements.txt
python transformer.py

to test the code.

You should be seeing output from the language model.

Due to the large size of the model (314B parameters), a multi-GPU machine is required to test the model with the example code.

p.s. we're hiring: https://x.ai/careers