Edit model card

This is the SFT checkpoint used for the project Online-RLHF. Also check our technical report here.

The model is trained from meta-llama/Meta-Llama-3-8B on a mixture of diverse open-source high-quality data for 1 epoch with detailed parameters in the report. It has not been trained by RLHF and can serve as a good starting point for the RLHF research.

Downloads last month
4,868
Safetensors
Model size
8.03B params
Tensor type
BF16
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Space using RLHFlow/LLaMA3-SFT 1

Collection including RLHFlow/LLaMA3-SFT