xai-org
/

grok-1

Text Generation

Model card Files Files and versions Community

grok-1 / README.md

Citaman's picture

Upload 2 files

1bc9149 verified 6 months ago

|

3.63 kB

	---
	license: apache-2.0
	---
	# Grok-1
	---
	_This repository contains the weights of the Grok-1 open-weights model._

	To get started with using the model, follow the instructions at `github.com/xai-org/grok.`


	![The cover image was generated using Midjourney based on the following prompt proposed by Grok: A 3D illustration of a neural network, with transparent nodes and glowing connections, showcasing the varying weights as different thicknesses and colors of the connecting lines.](./model_logo.png)

	<small>The cover image was generated using [Midjourney](midjourney.com) based on the following prompt proposed by Grok: A 3D illustration of a neural network, with transparent nodes and glowing connections, showcasing the varying weights as different thicknesses and colors of the connecting lines.</small>

	---

	╔══════════════════════════╗
	║ _______ ║
	║ /\ \|_ _\| ║
	║ __ __ / \ \| \| ║
	║ \ \/ / / /\ \ \| \| ║
	║ > < / ____ \ _\| \|_ ║
	║ /_/\_\/_/ \_\_____\| ║
	║ ║
	║ Understand the Universe ║
	║ [https://x.ai] ║
	╚════════════╗╔════════════╝
	╔════════╝╚═════════╗
	║ xAI Grok-1 (314B) ║
	╚════════╗╔═════════╝
	╔═════════════════════╝╚═════════════════════╗
	║ 314B parameter Mixture of Experts model ║
	║ - Base model (not finetuned) ║
	║ - 8 experts (2 active) ║
	║ - 86B active parameters ║
	║ - Apache 2.0 license ║
	║ - Code: https://github.com/xai-org/grok-1 ║
	║ - Happy coding! ║
	╚════════════════════════════════════════════╝

	## Model Configuration Details

	Vocabulary Size: 131,072

	Special Tokens:
	- Pad Token: 0
	- End of Sequence Token: 2

	Sequence Length: 8192

	### Model Architecture: MoE
	- Embedding Size: 6,144
	- Rotary Embedding (RoPE)
	- Layers: 64
	- Experts: 8
	- Selected Experts: 2
	- Widening Factor: 8
	- Key Size: 128
	- Query Heads: 48
	- Key Value Heads: 8
	- Activation Sharding: Data-wise, Model-wise
	- Tokenizer : SentencePiece tokenizer

	### Inference Configuration:
	- Batch Size per Device: 0.125
	- Tokenizer: `./tokenizer.model`
	- Local Mesh: 1x8
	- Between Hosts: 1x1


	## Inference Details

	Make sure to download the `int8` checkpoint to the `checkpoints` directory and run

	```shell
	pip install -r requirements.txt
	python transformer.py
	```

	to test the code.

	You should be seeing output from the language model.

	Due to the large size of the model (314B parameters), a multi-GPU machine is required to test the model with the example code.

	p.s. we're hiring: https://x.ai/careers