Edit model card

outputs

This model is a fine-tuned version of microsoft/phi-2 using trl on ultrafeedback dataset.

What's new

A test for ORPO: Monolithic Preference Optimization without Reference Model method using trl library.

How to reproduce

accelerate launch --config_file=/path/to/trl/examples/accelerate_configs/deepspeed_zero2.yaml \
    --num_processes 8 \
    /path/to/trl/scripts/orpo.py \
    --model_name_or_path="microsoft/phi-2" \
    --per_device_train_batch_size 1 \
    --max_steps 8000 \
    --learning_rate 8e-5 \
    --gradient_accumulation_steps 1 \
    --logging_steps 20 \
    --eval_steps 2000 \
    --output_dir="orpo-lora-phi2" \
    --optim rmsprop \
    --warmup_steps 150 \
    --bf16 \
    --logging_first_step \
    --no_remove_unused_columns \
    --use_peft \
    --lora_r=16 \
    --lora_alpha=16 \
    --dataset HuggingFaceH4/ultrafeedback_binarized
Downloads last month
61
Safetensors
Model size
2.78B params
Tensor type
BF16
·
Inference API
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Amu/orpo-lora-phi2

Base model

microsoft/phi-2
Finetuned
this model