princeton-nlp commited on
Commit
1b0c0ab
1 Parent(s): 05ab6c8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +61 -79
README.md CHANGED
@@ -3,7 +3,7 @@ library_name: transformers
3
  tags: []
4
  ---
5
 
6
- # Model Card for Model ID
7
 
8
  SimPO (Simple Preference Optimization) is an offline preference optimization algorithm designed to enhance the training of large language models (LLMs) with preference optimization datasets. SimPO aligns the reward function with the generation likelihood, eliminating the need for a reference model and incorporating a target reward margin to boost performance. Please refer to our [preprint](https://arxiv.org/pdf/2405.14734) and [github repo](https://github.com/princeton-nlp/SimPO) for more details.
9
 
@@ -12,10 +12,11 @@ SimPO (Simple Preference Optimization) is an offline preference optimization alg
12
 
13
  ### Model Description
14
 
15
- <!-- Provide a longer summary of what this model is. -->
 
16
 
17
 
18
- - **Developed by:** Yu Meng, Mengzhou Xia
19
  - **Model type:** Causal Language Model
20
  - **License:** gemma
21
  - **Finetuned from model:** [google/gemma-2-9b-it](https://huggingface.co/google/gemma-2-9b-it)
@@ -30,40 +31,41 @@ SimPO (Simple Preference Optimization) is an offline preference optimization alg
30
 
31
 
32
  ## How to Get Started with the Model
33
-
34
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
 
36
  ## Training Details
37
 
38
  ### Training Data
39
 
40
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
41
-
42
- [More Information Needed]
43
-
44
- ### Training Procedure
45
-
46
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
47
-
48
- #### Preprocessing [optional]
49
-
50
- [More Information Needed]
51
-
52
 
53
  #### Training Hyperparameters
54
 
55
  - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
56
 
57
- #### Speeds, Sizes, Times [optional]
58
-
59
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
60
 
61
- [More Information Needed]
62
 
63
  ## Evaluation
64
 
65
- <!-- This section describes the evaluation protocols and provides the results. -->
66
-
67
  ### Testing Data, Factors & Metrics
68
 
69
  #### Testing Data
@@ -92,68 +94,48 @@ SimPO (Simple Preference Optimization) is an offline preference optimization alg
92
 
93
 
94
 
95
- ## Model Examination [optional]
96
-
97
- <!-- Relevant interpretability work for the model goes here -->
98
-
99
- [More Information Needed]
100
-
101
- ## Environmental Impact
102
-
103
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
104
-
105
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
106
-
107
- - **Hardware Type:** [More Information Needed]
108
- - **Hours used:** [More Information Needed]
109
- - **Cloud Provider:** [More Information Needed]
110
- - **Compute Region:** [More Information Needed]
111
- - **Carbon Emitted:** [More Information Needed]
112
-
113
- ## Technical Specifications [optional]
114
 
115
  ### Model Architecture and Objective
116
 
117
- [More Information Needed]
118
-
119
- ### Compute Infrastructure
120
-
121
- [More Information Needed]
122
 
123
  #### Hardware
124
 
125
- [More Information Needed]
126
 
127
  #### Software
128
 
129
- [More Information Needed]
130
-
131
- ## Citation [optional]
132
-
133
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
134
-
135
- **BibTeX:**
136
-
137
- [More Information Needed]
138
-
139
- **APA:**
140
-
141
- [More Information Needed]
142
-
143
- ## Glossary [optional]
144
-
145
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
146
-
147
- [More Information Needed]
148
-
149
- ## More Information [optional]
150
-
151
- [More Information Needed]
152
-
153
- ## Model Card Authors [optional]
154
-
155
- [More Information Needed]
156
-
157
- ## Model Card Contact
158
-
159
- [More Information Needed]
 
 
 
3
  tags: []
4
  ---
5
 
6
+ # gemma-2-9b-it-SimPO Model Card
7
 
8
  SimPO (Simple Preference Optimization) is an offline preference optimization algorithm designed to enhance the training of large language models (LLMs) with preference optimization datasets. SimPO aligns the reward function with the generation likelihood, eliminating the need for a reference model and incorporating a target reward margin to boost performance. Please refer to our [preprint](https://arxiv.org/pdf/2405.14734) and [github repo](https://github.com/princeton-nlp/SimPO) for more details.
9
 
 
12
 
13
  ### Model Description
14
 
15
+ We fine-tuned [google/gemma-2-9b-it](https://huggingface.co/google/gemma-2-9b-it) on with the SimPO objective.
16
+ , a preference optimization dataset where the prompts are from [HuggingFaceH4/ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized)
17
 
18
 
19
+ - **Developed by:** Yu Meng, Mengzhou Xia, Danqi Chen
20
  - **Model type:** Causal Language Model
21
  - **License:** gemma
22
  - **Finetuned from model:** [google/gemma-2-9b-it](https://huggingface.co/google/gemma-2-9b-it)
 
31
 
32
 
33
  ## How to Get Started with the Model
34
+ ```
35
+ import torch
36
+ from transformers import pipeline
37
+ import json
38
+ import warnings
39
+
40
+ model_id = "princeton-nlp/gemma-2-9b-it-SimPO"
41
+
42
+ generator = pipeline(
43
+ "text-generation",
44
+ model=model_id,
45
+ model_kwargs={"torch_dtype": torch.bfloat16},
46
+ device="cuda",
47
+ )
48
+ generator.tokenizer.chat_template = template
49
+ outputs = generator([{"role": "user", "content": "What's the difference between llamas and alpacas?"}], do_sample=False, max_new_tokens=200)
50
+ print(outputs[0]['generated_text'])
51
+ ```
52
 
53
  ## Training Details
54
 
55
  ### Training Data
56
 
57
+ We use
 
 
 
 
 
 
 
 
 
 
 
58
 
59
  #### Training Hyperparameters
60
 
61
  - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
62
 
63
+ #### Speeds, Sizes, Times
 
 
64
 
65
+ Fine-tuning the [google/gemma-2-9b-it](https://huggingface.co/google/gemma-2-9b-it) on takes around 100 mins to finish on 8xH100 GPUs.
66
 
67
  ## Evaluation
68
 
 
 
69
  ### Testing Data, Factors & Metrics
70
 
71
  #### Testing Data
 
94
 
95
 
96
 
97
+ ## Technical Specifications
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
98
 
99
  ### Model Architecture and Objective
100
 
101
+ The model architecture is based on [google/gemma-2-9b-it](https://huggingface.co/google/gemma-2-9b-it). We use the SimPO training objective proposed in our [preprint](https://arxiv.org/pdf/2405.14734).
 
 
 
 
102
 
103
  #### Hardware
104
 
105
+ We used 8xH100 GPUs for model training.
106
 
107
  #### Software
108
 
109
+ Training was done using the [alignment-handbook](https://github.com/huggingface/alignment-handbook) library.
110
+
111
+ ## Citation
112
+
113
+ @article{gemma_2024,
114
+ title={Gemma},
115
+ url={https://www.kaggle.com/m/3301},
116
+ DOI={10.34740/KAGGLE/M/3301},
117
+ publisher={Kaggle},
118
+ author={Gemma Team},
119
+ year={2024}
120
+ }
121
+
122
+ @article{meng2024simpo,
123
+ title={{SimPO}: Simple preference optimization with a reference-free reward},
124
+ author={Meng, Yu and Xia, Mengzhou and Chen, Danqi},
125
+ journal={arXiv preprint arXiv:2405.14734},
126
+ year={2024}
127
+ }
128
+
129
+ @article{cui2023ultrafeedback,
130
+ title={{UltraFeedback}: Boosting language models with high-quality feedback},
131
+ author={Cui, Ganqu and Yuan, Lifan and Ding, Ning and Yao, Guanming and Zhu, Wei and Ni, Yuan and Xie, Guotong and Liu, Zhiyuan and Sun, Maosong},
132
+ journal={arXiv preprint arXiv:2310.01377},
133
+ year={2023}
134
+ }
135
+
136
+ @article{wang2024interpretable,
137
+ title={Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts},
138
+ author={Wang, Haoxiang and Xiong, Wei and Xie, Tengyang and Zhao, Han and Zhang, Tong},
139
+ journal={arXiv preprint arXiv:2406.12845},
140
+ year={2024}
141
+ }