Training details

#1
by Viewegger - opened

Hello,

Thank you for providing these models into the public!
I would like to try to finetune the model on similar dataset structure, just for different language.

Could I ask you what was your hardware setup and what parameters you used? And how long did it take to complete the training in overall?

Hello,
Thank you for your interest in our work! We're thrilled to hear that you're considering fine-tuning the model for a different language.

Importance of Data:

From our experience, the cornerstone of successful fine-tuning lies in the quality of the data. In fact, the majority of our R&D efforts were focused on this aspect.

Hardware Setup

We utilized an AWS EC2 instance of type g5.12xlarge, equipped with 4 x A10G Tensor Core GPUs, for our training.

Training Parameters

Here are the specifics of the parameters we used:

  • Batch Size: 32
  • Micro Batch Size: 4
  • Validation Set Size: 32
  • Learning Rate: 4e-4
  • Optimizer: adamw_torch
  • Warmup Steps: 100
  • LoRA Parameters:
    • lora_r: 16
    • lora_alpha: 32
    • lora_dropout: 0.05
    • qlora: True
    • lora_target_modules: ['q_proj','v_proj']
  • Load in 8-bit: False
  • Cutoff Length: 2048

Training Time

The training process took approximately 22 hours to complete a single epoch.

Additional Tips:

  • Smaller Models: We recommend starting with the 7b model for quicker experimentation. Throughout our project, experimenting with smaller models helped us make informed decisions about our dataset.
  • Translation Pairs: We're currently investigating the impact of translation pairs, and our early results look promising. You might find it beneficial to explore this method further. Our dataset, which focuses on English-Norwegian translation pairs, is available here. It was created by matching IDs between English and Norwegian datasets, specifically from this source.

Feel free to reach out if you have any more questions or need further clarification.
Best regards, Ruter AI team

That's actually much less demanding than I thought, I have a local rig with 4x3090 so with some testing it shouldn't take more than few weeks.

Thank you!

Viewegger changed discussion status to closed

Sign up or log in to comment