rungao2001
/

vicuna-7b-v1.5-dpo-mix-7k-full

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

rungao2001 commited on Jul 22

Commit

c117982

•

1 Parent(s): f0beda9

Update README.md

Files changed (1) hide show

README.md +48 -1

README.md CHANGED Viewed

@@ -5,4 +5,51 @@ datasets:
 language:
 - en
 pipeline_tag: text-generation
----

 language:
 - en
 pipeline_tag: text-generation
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+<!--This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1). -->
+## Model Details
+### Model Description
+This model is DPO by `argilla/dpo-mix-7k` dataset on `rungao2001/vicuna-7b-v1.5_deita10k_sft_full` model.
+- **Model type:** Llama2 Decoder-Only
+- **Language(s) (NLP):** English
+- **License:** llama2
+- **Finetuned from model:** rungao2001/vicuna-7b-v1.5_deita10k_sft_full
+## Training Details
+### Training Data
+argilla/dpo-mix-7k
+### Training Procedure
+DPO
+Notice: `do_sample` in `generation_config.json` was set to `True` to avoid this error `https://github.com/huggingface/transformers/issues/29988`.
+Notice: `chat_template` was modified because the original vicuna1.1 format can not be used in trl.DPOTrainer. The \"Conversation roles must alternate user/assistant/user/assistant/...\" Error was removed, and system message is output only when loop.index0 == 0 and role == \'user\'.
+#### Training Hyperparameters
+- **Precision:** BFloat16
+- **Chat Template:** Modified Vicuna 1.1
+- **Global Batch Size:** 128
+- **Learning Rate:** 1.0e-6
+- **Num Epoches:** 3
+- **Max Prompt Length:** 1800
+- **Max Length:** 2048
+- **Training Steps** 156
+## Evaluation
+It Finally achieved loss=0.5006, and `rewards/accuracies` = 78.72% in the eval set of `HuggingFaceH4/deita-10k-v0-sft`
+### Testing Data, Factors & Metrics