Edit model card

Gromenauer-7B

gromenauer-7B logo

Overview

Gromenauer-7B is a Spanish language model designed to understand and generate high-quality Spanish text. Developed using the robust Mistral architecture, this model has been trained on an extensive literary corpus, ensuring it captures a wide range of linguistic nuances, styles, and contexts found in Spanish literature.

Model Details

  • Model Type: Mistral
  • Sequence Length: 8192
  • Hidden Dimension: 4096
  • Intermediate Dimension: 14336
  • Number of Layers: 32
  • Number of Attention Heads: 32
  • Number of Key-Value Heads: 8
  • Activation Function: SiLU
  • Initializer Range: 0.02
  • Layer Norm Epsilon: 1.0e-05
  • Use Flash Attention: Yes
  • Gradient Checkpointing: Enabled (Block Size: 5)
  • Sliding Window Attention: 4096
  • Use Bias: No

Training Details

  • Tokenizer: mistralai/Mistral-7B-v0.1
  • Batch Size: 512
  • Learning Rate: 1e-5
  • Optimizer: Adam with beta1=0.9, beta2=0.95, epsilon=1e-8
  • Weight Decay: 0.1
  • Warmup Steps: 200
  • Learning Rate Schedule: Cosine
  • Number of Training Steps: 7000

Usage

To load the model in your project, you can use the following code:

from transformers import AutoModel, AutoTokenizer

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("bertin-project/Gromenauer-7B")

# Load the model
model = AutoModel.from_pretrained("bertin-project/Gromenauer-7B")

# Example usage
text = "Introduce aquí tu texto en español."
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
Downloads last month
28
Safetensors
Model size
7.24B params
Tensor type
F32
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train bertin-project/Gromenauer-7B