--- base_model: unsloth/Mistral-Nemo-Instruct-2407 language: - en license: apache-2.0 tags: - text-generation-inference - transformers - unsloth - mistral - trl --- # Uploaded model - **Developed by:** UsernameJustAnother - **License:** apache-2.0 - **Finetuned from model :** unsloth/Mistral-Nemo-Instruct-2407 Experimental RP Finetune with secret sauce dataset, rsLoRA, r = 256, on an Colab A100 instance. 36GB vRAM used, 2 epochs ~ 3.5hrs of training. This is for A/B testing vs Marlin v1, to see what difference rank 256 (v2) has compared to rank 64 (v1). ``` ==((====))== Unsloth - 2x faster free finetuning | Num GPUs = 1 \\ /| Num examples = 8,160 | Num Epochs = 2 O^O/ \_/ \ Batch size per device = 2 | Gradient Accumulation steps = 4 \ / Total batch size = 8 | Total steps = 2,040 "-____-" Number of trainable parameters = 912,261,120 r = 256, target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj",], lora_alpha = 16, lora_dropout = 0, # Supports any, but = 0 is optimized bias = "none", # Supports any, but = "none" is optimized use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context random_state = 3407, use_rslora = True, # lora_alpha --> 16 loftq_config = None, per_device_train_batch_size = 2, gradient_accumulation_steps = 4, warmup_steps = 5, num_train_epochs = 2, learning_rate = 2e-5, # down from 2e-4, could go down to (5e-5 then 1e-5) fp16 = not is_bfloat16_supported(), bf16 = is_bfloat16_supported(), logging_steps = 1, optim = "adamw_8bit", weight_decay = 0.01, lr_scheduler_type = "linear", seed = 3407, ``` This mistral model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library. [](https://github.com/unslothai/unsloth)