Edit model card

Model Card for Llama 3 Instruct Fine-Tuned on Deutsche Bahn FAQ

Model Overview

Model Name: islam-hajosman/llama3_instruct_fine_tuned_bahn_1k_v1_model
Architecture: Llama 3 Instruct
Quantization: 4-bit NF4 with double quantization
Domain-Specific Fine-Tuning Dataset: islam-hajosman/deutsche_bahn_faq_1k

This model has been fine-tuned to provide improved answers for frequently asked questions (FAQ) from the Deutsche Bahn website. This is part of a Master's thesis project aiming to enhance the model's domain-specific capabilities.

Fine-Tuning Configuration

Quantization Configuration

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16
)

LoRA Configuration

lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    lora_dropout=0.0,
    bias="none",
    target_modules=['q_proj', 'k_proj', 'v_proj', 'o_proj', 'gate_proj', 'up_proj', 'down_proj'],
    task_type="CAUSAL_LM"
)
model = get_peft_model(model, lora_config)

Training Arguments

training_args = TrainingArguments(
    output_dir="./llama3_instruct_fine_tuned_bahn_1k_v1_output",
    dataloader_drop_last=False,
    save_strategy="epoch",
    logging_strategy="steps",
    num_train_epochs=30,
    logging_steps=1,
    per_device_train_batch_size=8,
    gradient_accumulation_steps=8,
    optim="adamw_8bit",
    learning_rate=1e-4,
    lr_scheduler_type="cosine",
    warmup_ratio=0.1,
    bf16=true, 
    weight_decay=0.0,
    run_name="llama3_instruct_fine_tuned_bahn_1k_v1_report",
    report_to="wandb"
)

Sequence Length Configuration

  • max_seq_length was set to 512, covering 99.3% of the data. Only 7 entries exceeded this length, which were truncated accordingly.

Hardware Used

  • GPU: 1x H100 (80 GB PCIe)
  • CPU: 26 cores
  • RAM: 205.4 GB
  • Storage: 1.1 TB SSD
  • Cost: $2.5 per hour

Training Summary

  • Total Trainable Parameters: 0.915% of 8B parameters
  • LoRA-Adaptor Size: 4.37GB
  • Training Time and Cost: $2 for 50 minutes
  • Number of Steps per Epoch: 16 (based on 1024 samples, batch size 8, gradient accumulation 8)

Performance Metrics

  • Training Completed:
    • TrainOutput(global_step=480, training_loss=0.28411184588912874, metrics={'train_runtime': 3012.7974, 'train_samples_per_second': 10.197, 'train_steps_per_second': 0.159, 'total_flos': 3.871795189898281e+17, 'train_loss': 0.28411184588912874, 'epoch': 30.0})

Weights & Biases Tracking

Usage

To use this model, load it from Huggingface using the model name islam-hajosman/llama3_instruct_fine_tuned_bahn_1k_v1_model. This model is optimized for providing domain-specific answers to Deutsche Bahn FAQ.

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("islam-hajosman/llama3_instruct_fine_tuned_bahn_1k_v1_model")
model = AutoModelForCausalLM.from_pretrained("islam-hajosman/llama3_instruct_fine_tuned_bahn_1k_v1_model")

input_text = "Ihre Frage hier"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(response)
Downloads last month
5
Safetensors
Model size
8.03B params
Tensor type
FP16
·
Inference API
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train islam-hajosman/llama3_instruct_fine_tuned_bahn_1k_v1_model