--- library_name: peft tags: - generated_from_trainer - axolotl base_model: winglian/meta-llama3-chatml model-index: - name: llama-3-orpo-qlora results: [] datasets: - mlabonne/orpo-dpo-mix-40k --- WandB: https://wandb.ai/oaaic/orpo-llama-3/runs/gc2d3cxp Benchmarks: TBD [

](https://github.com/OpenAccess-AI-Collective/axolotl)

See axolotl config

axolotl version: `0.4.0` ```yaml base_model: winglian/meta-llama3-chatml model_type: LlamaForCausalLM tokenizer_type: AutoTokenizer load_in_4bit: true rl: orpo orpo_alpha: 0.1 chat_template: chatml datasets: - path: mlabonne/orpo-dpo-mix-40k type: chat_template.argilla chat_template: chatml dataset_prepared_path: last_run_prepared val_set_size: 0.01 output_dir: ./llama-3-orpo-qlora sequence_len: 2048 sample_packing: false pad_to_sequence_len: false adapter: qlora lora_r: 16 lora_alpha: 32 lora_dropout: 0.05 lora_target_modules: - q_proj - k_proj - v_proj - o_proj - gate_proj - up_proj - down_proj wandb_project: orpo-llama-3 wandb_entity: oaaic wandb_watch: wandb_name: wandb_log_model: gradient_accumulation_steps: 4 micro_batch_size: 8 num_epochs: 1 optimizer: paged_adamw_8bit lr_scheduler: cosine learning_rate: 1.4e-5 max_grad_norm: 1.0 train_on_inputs: false group_by_length: false bf16: true tf32: true gradient_checkpointing: true gradient_checkpointing_kwargs: use_reentrant: true logging_steps: 1 flash_attention: true warmup_steps: 10 evals_per_epoch: 5 saves_per_epoch: 1 weight_decay: 0.0 special_tokens: pad_token: <|end_of_text|> ```

# llama-3-orpo-qlora This model was trained from scratch on the None dataset. ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 1.4e-05 - train_batch_size: 8 - eval_batch_size: 8 - seed: 42 - gradient_accumulation_steps: 4 - total_train_batch_size: 32 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 10 - training_steps: 1241 ### Training results ### Framework versions - PEFT 0.10.0 - Transformers 4.40.0.dev0 - Pytorch 2.1.2+cu118 - Datasets 2.15.0 - Tokenizers 0.15.0