strategy for training this model

#1
by gd1m3y - opened

I was curious what kind of data was used to train this for ppo also what strategy was used for deciding reward

trl internal testing org

Hi @gd1m3y
Thanks for your interest
This model is not intended to be trained or used of out the box but only for testing purposes

Sign up or log in to comment