phi2-lora-quantized-distilabel-intel-orca-dpo-pairs

This model is a fine-tuned version of microsoft/phi-2 on distilabel-intel-orca-dpo-pairs. The full training notebook can be found here.

It achieves the following results on the evaluation set:

Loss: 0.4537
Rewards/chosen: -0.0837
Rewards/rejected: -1.2628
Rewards/accuracies: 0.8301
Rewards/margins: 1.1791
Logps/rejected: -224.8409
Logps/chosen: -203.2228
Logits/rejected: 0.4773
Logits/chosen: 0.3062

Model description

The adapter was fine-tuned on a Google Colab A100 GPU using DPO and the distilabel-intel-orca-dpo-pairs. In order to scale LoRa approached for LLMs, I recommend looking at predibase/lorax.

You can play around with the model shown below. We load the LoRa adapter and bits_n_bytes config (only when CUDA is available).

import torch
import torch
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig
)
from peft import PeftModel

# template used for fine-tune
# template = """\
# Instruct: {instruction}\n
# Output: {response}"""

if torch.cuda.is_available():
    device = torch.device("cuda")
    print(f"Using {torch.cuda.get_device_name(0)}")
    bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type='nf4',
        bnb_4bit_compute_dtype='float16',
        bnb_4bit_use_double_quant=False,
    )
elif torch.backends.mps.is_available():
    device = torch.device("mps")
    bnb_config = None
else:
    device = torch.device("cpu")
    bnb_config = None
    print("No GPU available, using CPU instead.")

config = PeftConfig.from_pretrained("davidberenstein1957/phi2-lora-quantized-distilabel-intel-orca-dpo-pairs")
model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2", torch_dtype=torch.float16, quantization_config=bnb_config)
model = PeftModel.from_pretrained(model, "davidberenstein1957/phi2-lora-quantized-distilabel-intel-orca-dpo-pairs").to(device)

prompt = "Instruct: What is the capital of France? \nOutput:""
inputs = tokenizer(prompt, return_tensors="pt", return_attention_mask=False)

outputs = model.generate(**inputs)
text = tokenizer.batch_decode(outputs)[0]

Intended uses & limitations

This is a LoRa adapter fine-tine for phi-2 and not a full fine-tune of the model. Additionally, I did not spend time updating parameters.

Training and evaluation data

The adapter was fine-tuned on a Google Colab A100 GPU using DPO and the distilabel-intel-orca-dpo-pairs. The full training notebook can be found here. Underneath, there are some configs for the adapter and the trainer.

peft_config = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.5,
    r=32,
    target_modules=['k_proj', 'q_proj', 'v_proj', 'fc1', 'fc2'],
    bias="none",
    task_type="CAUSAL_LM",
)

training_arguments = TrainingArguments(
    output_dir=f"./{model_name}",
    evaluation_strategy="steps",
    do_eval=True,
    optim="paged_adamw_8bit",
    per_device_train_batch_size=2,
    gradient_accumulation_steps=16,
    per_device_eval_batch_size=2,
    log_level="debug",
    save_steps=20,
    logging_steps=20,
    learning_rate=1e-5,
    eval_steps=20,
    num_train_epochs=1, # Modified for tutorial purposes
    max_steps=100,
    warmup_steps=20,
    lr_scheduler_type="linear",
)

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 16
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 20
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6853	0.06	20	0.6701	0.0133	-0.0368	0.6905	0.0501	-212.5803	-202.2522	0.3853	0.2532
0.6312	0.12	40	0.5884	0.0422	-0.2208	0.8138	0.2630	-214.4207	-201.9638	0.4254	0.2816
0.547	0.19	60	0.5146	0.0172	-0.5786	0.8278	0.5958	-217.9983	-202.2132	0.4699	0.3110
0.4388	0.25	80	0.4893	-0.0808	-1.0789	0.8293	0.9981	-223.0014	-203.1934	0.5158	0.3396
0.4871	0.31	100	0.4818	-0.1298	-1.2346	0.8297	1.1048	-224.5586	-203.6837	0.5133	0.3340
0.4863	0.37	120	0.4723	-0.1230	-1.1718	0.8301	1.0488	-223.9305	-203.6159	0.4910	0.3167
0.4578	0.44	140	0.4666	-0.1257	-1.1772	0.8301	1.0515	-223.9844	-203.6428	0.4795	0.3078
0.4587	0.5	160	0.4625	-0.0746	-1.1272	0.8301	1.0526	-223.4841	-203.1310	0.4857	0.3139
0.4688	0.56	180	0.4595	-0.0584	-1.1194	0.8297	1.0610	-223.4062	-202.9692	0.4890	0.3171
0.4189	0.62	200	0.4579	-0.0666	-1.1647	0.8297	1.0982	-223.8598	-203.0511	0.4858	0.3138
0.4392	0.68	220	0.4564	-0.0697	-1.1915	0.8301	1.1219	-224.1278	-203.0823	0.4824	0.3110
0.4659	0.75	240	0.4554	-0.0826	-1.2245	0.8301	1.1419	-224.4574	-203.2112	0.4761	0.3052
0.4075	0.81	260	0.4544	-0.0823	-1.2328	0.8301	1.1504	-224.5403	-203.2089	0.4749	0.3044
0.4015	0.87	280	0.4543	-0.0833	-1.2590	0.8301	1.1757	-224.8026	-203.2188	0.4779	0.3067
0.4365	0.93	300	0.4539	-0.0846	-1.2658	0.8301	1.1812	-224.8702	-203.2313	0.4780	0.3067
0.4589	1.0	320	0.4537	-0.0837	-1.2628	0.8301	1.1791	-224.8409	-203.2228	0.4773	0.3062

Framework versions

PEFT 0.7.1
Transformers 4.37.1
Pytorch 2.1.0+cu121
Datasets 2.16.1
Tokenizers 0.15.1

argilla
/

phi2-lora-distilabel-intel-orca-dpo-pairs

phi2-lora-quantized-distilabel-intel-orca-dpo-pairs

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Adapter for

Dataset used to train argilla/phi2-lora-distilabel-intel-orca-dpo-pairs

Evaluation results

phi2-lora-quantized-distilabel-intel-orca-dpo-pairs

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Adapter for microsoft/phi-2

Dataset used to train argilla/phi2-lora-distilabel-intel-orca-dpo-pairs

Evaluation results

Adapter for