Update README.md
Browse files
README.md
CHANGED
@@ -4,11 +4,14 @@ datasets:
|
|
4 |
pipeline_tag: text-classification
|
5 |
---
|
6 |
|
7 |
-
**Paper:** Coming soon
|
8 |
|
|
|
|
|
|
|
9 |
# Brief
|
10 |
|
11 |
-
[URM-
|
12 |
This RM consists of a base model and an uncertainty-aware and attribute-specific value head. The base model of this RM is from [FsfairX-LLaMA3-RM-v0.1](https://huggingface.co/sfairXC/FsfairX-LLaMA3-RM-v0.1).
|
13 |
|
14 |
## Attribute Regression
|
@@ -17,14 +20,14 @@ This RM consists of a base model and an uncertainty-aware and attribute-specific
|
|
17 |
|
18 |
During training, instead of multi-attributes scores, outputs of the uncertainty-aware value head are parameters of a normal distribution, from which scores are sampled. Then we run regression on the outputs with the labels to train the value head. To enable gradient back-propagation, reparameterization technique is used.
|
19 |
|
20 |
-
We use the five attributes from HelpSteer2: Helpfulness, Correctness, Coherence, Complexity and Verbosity.
|
21 |
|
22 |
# Usage
|
23 |
```python
|
24 |
import torch
|
25 |
from transformers import AutoModelForSequenceClassification, AutoTokenizer
|
26 |
|
27 |
-
model_name = "
|
28 |
model = AutoModelForSequenceClassification.from_pretrained(
|
29 |
model_name,
|
30 |
device_map='auto',
|
|
|
4 |
pipeline_tag: text-classification
|
5 |
---
|
6 |
|
7 |
+
- **Paper:** Coming soon
|
8 |
|
9 |
+
- **Model:** [URM-LLaMa-3-8B](https://huggingface.co/LxzGordon/URM-LLaMa-3-8B)
|
10 |
+
|
11 |
+
- Fine-tuned from [FsfairX-LLaMA3-RM-v0.1](https://huggingface.co/sfairXC/FsfairX-LLaMA3-RM-v0.1)
|
12 |
# Brief
|
13 |
|
14 |
+
[URM-LLaMa-3-8B](https://huggingface.co/LxzGordon/URM-LLaMa-3-8B) is an uncertain-aware reward model.
|
15 |
This RM consists of a base model and an uncertainty-aware and attribute-specific value head. The base model of this RM is from [FsfairX-LLaMA3-RM-v0.1](https://huggingface.co/sfairXC/FsfairX-LLaMA3-RM-v0.1).
|
16 |
|
17 |
## Attribute Regression
|
|
|
20 |
|
21 |
During training, instead of multi-attributes scores, outputs of the uncertainty-aware value head are parameters of a normal distribution, from which scores are sampled. Then we run regression on the outputs with the labels to train the value head. To enable gradient back-propagation, reparameterization technique is used.
|
22 |
|
23 |
+
We use the five attributes from HelpSteer2: Helpfulness, Correctness, Coherence, Complexity and Verbosity. We use weighted sum to combine these attributes with prior weights ```[0.3, 0.74, 0.46, 0.47,-0.33]``` recommended by [Nemotron-4](https://huggingface.co/nvidia/Nemotron-4-340B-Reward).
|
24 |
|
25 |
# Usage
|
26 |
```python
|
27 |
import torch
|
28 |
from transformers import AutoModelForSequenceClassification, AutoTokenizer
|
29 |
|
30 |
+
model_name = "LxzGordon/URM-LLaMa-3-8B"
|
31 |
model = AutoModelForSequenceClassification.from_pretrained(
|
32 |
model_name,
|
33 |
device_map='auto',
|