Comparative Study:OPT-350M and GPT-2 w Reward-based Training
Collection
Comparative Study: Training OPT-350M and GPT-2 on Anthropic’s HH-RLHF Dataset Using Reward-Based Training
•
2 items
•
Updated
The following bitsandbytes
quantization config was used during training: