Edit model card

mistrial_nemo_output

This model is a fine-tuned version of unsloth/Mistral-Nemo-Base-2407-bnb-4bit on the generator dataset. It achieves the following results on the evaluation set:

  • Loss: 1.5118

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 100
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 64
  • total_eval_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • num_epochs: 1.0

Training results

Training Loss Epoch Step Validation Loss
1.7287 0.0516 20 1.7033
1.6508 0.1033 40 1.6490
1.6242 0.1549 60 1.6253
1.6216 0.2066 80 1.6089
1.619 0.2582 100 1.5958
1.5579 0.3099 120 1.5842
1.5578 0.3615 140 1.5739
1.5515 0.4132 160 1.5641
1.5739 0.4648 180 1.5550
1.5669 0.5165 200 1.5460
1.5601 0.5681 220 1.5380
1.5392 0.6198 240 1.5310
1.5321 0.6714 260 1.5251
1.5326 0.7230 280 1.5201
1.5197 0.7747 300 1.5165
1.5229 0.8263 320 1.5142
1.4988 0.8780 340 1.5127
1.5044 0.9296 360 1.5119
1.5105 0.9813 380 1.5118

Framework versions

  • PEFT 0.12.1.dev0
  • Transformers 4.45.0.dev0
  • Pytorch 2.4.0+cu121
  • Datasets 2.21.0
  • Tokenizers 0.19.1
Downloads last month
30
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for xxxxxccc/mistrial_nemo_output

Adapter
(7)
this model