pansophic commited on
Commit
a3d28ed
β€’
1 Parent(s): ba63e6f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -3
README.md CHANGED
@@ -12,6 +12,7 @@ base_model: stabilityai/stablelm-3b-4e1t
12
 
13
  # Rocket-3B 🦝
14
  <b>Rocket</b> 🦝 is a 3 billion large language model that was trained on a mix of publicly available datasets using [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290). The prompt format used is <b>ChatML</b>.
 
15
 
16
 
17
  ## Model description
@@ -31,13 +32,14 @@ Despite its compact dimensions, the model achieves outstanding scores in both MT
31
  | Falcon-Instruct πŸ¦…| 40B | SFT |5.17 |45.71|
32
  | Orca-2| 13B | SFT |6.15 |-|
33
  | Xwin-LMv0.1 | 7B| PPO | 6.19| 87.83|
34
- | Llama2-Chat πŸ¦™| 7B |RLHF |6.26| -|
35
  | TÜLU 2 🐫| 7B | DPO |6.27| 85.1|
36
  | Guanaco πŸ¦™| 65B | SFT |6.41| 71.80|
37
  | **Rocket** 🦝 | **3B** | **DPO** | **6.56** | **79.75** |
38
- | Llama2-Chat πŸ¦™| 13B |RLHF |6.65| -|
39
  | Zephyr-7b-Ξ± πŸͺ |7B| DPO| 6.88| -|
40
  | Vicuna v1.3 πŸ¦™| 33B | SFT |7.12 |88.99|
 
41
  | WizardLM v1.0 πŸ¦™| 70B |SFT |7.71 |-|
42
  | GPT-3.5-turbo | - |RLHF |7.94 |89.37|
43
 
@@ -129,7 +131,7 @@ generated_text = model.generate(**inputs, max_length=3084, top_p=0.95, do_sample
129
  ## Bias, Risks, and Limitations
130
  Unlike ChatGPT, which incorporates in-the-loop filtering of responses and is aligned during the RLHF phase for safe completions, our model lacks these features. Consequently, it may generate problematic outputs, particularly when prompted in certain ways.
131
 
132
- The model pretraining datasets are comprised of a filtered mixture of open-source large-scale datasets available on the [HuggingFace Hub](https://huggingface.co/datasets): Falcon RefinedWeb extract ([Penedo et al., 2023](https://huggingface.co/datasets/tiiuae/falcon-refinedweb)), RedPajama-Data ([Together Computer., 2023](https://github.com/togethercomputer/RedPajama-Data)) and The Pile ([Gao et al., 2020](https://arxiv.org/abs/2101.00027)) both without the *Books3* subset, and StarCoder ([Li et al., 2023](https://arxiv.org/abs/2305.06161)).
133
 
134
 
135
  *Model card adapted from [Zephyr Beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta/blob/main/README.md) and [Tulu-2-7B](https://huggingface.co/allenai/tulu-2-7b/blob/main/README.md)*
 
12
 
13
  # Rocket-3B 🦝
14
  <b>Rocket</b> 🦝 is a 3 billion large language model that was trained on a mix of publicly available datasets using [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290). The prompt format used is <b>ChatML</b>.
15
+ *The model name is inspired by the small but formidable character from 'Guardians of the Galaxy'. Similar to its namesake, this model, with its 3 billion parameters, showcases remarkable efficiency and effectiveness, challenging larger models despite its smaller size."*
16
 
17
 
18
  ## Model description
 
32
  | Falcon-Instruct πŸ¦…| 40B | SFT |5.17 |45.71|
33
  | Orca-2| 13B | SFT |6.15 |-|
34
  | Xwin-LMv0.1 | 7B| PPO | 6.19| 87.83|
35
+ | Llama2-Chat πŸ¦™| 7B |RLHF |6.26| 71.37|
36
  | TÜLU 2 🐫| 7B | DPO |6.27| 85.1|
37
  | Guanaco πŸ¦™| 65B | SFT |6.41| 71.80|
38
  | **Rocket** 🦝 | **3B** | **DPO** | **6.56** | **79.75** |
39
+ | Llama2-Chat πŸ¦™| 13B |RLHF |6.65| 81.09|
40
  | Zephyr-7b-Ξ± πŸͺ |7B| DPO| 6.88| -|
41
  | Vicuna v1.3 πŸ¦™| 33B | SFT |7.12 |88.99|
42
+ | Zephyr-7b-Ξ² πŸͺ |7B| DPO| 7.34| 90.60|
43
  | WizardLM v1.0 πŸ¦™| 70B |SFT |7.71 |-|
44
  | GPT-3.5-turbo | - |RLHF |7.94 |89.37|
45
 
 
131
  ## Bias, Risks, and Limitations
132
  Unlike ChatGPT, which incorporates in-the-loop filtering of responses and is aligned during the RLHF phase for safe completions, our model lacks these features. Consequently, it may generate problematic outputs, particularly when prompted in certain ways.
133
 
134
+ The pretraining dataset is comprised of a filtered mixture of open-source large-scale datasets available on the [HuggingFace Hub](https://huggingface.co/datasets): Falcon RefinedWeb extract ([Penedo et al., 2023](https://huggingface.co/datasets/tiiuae/falcon-refinedweb)), RedPajama-Data ([Together Computer., 2023](https://github.com/togethercomputer/RedPajama-Data)) and The Pile ([Gao et al., 2020](https://arxiv.org/abs/2101.00027)) both without the *Books3* subset, and StarCoder ([Li et al., 2023](https://arxiv.org/abs/2305.06161)).
135
 
136
 
137
  *Model card adapted from [Zephyr Beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta/blob/main/README.md) and [Tulu-2-7B](https://huggingface.co/allenai/tulu-2-7b/blob/main/README.md)*