GPT-JT Model

#5
by LEberdeX - opened

As far as I know, thanks to some optimizations, it works much better than the usual GPT-J, and can even be equal to 100b+ models+

You might run into issues because JT is not meant for chatting at all, so it may be a step backwards. The best way to move forward would be to continue as it is now until you get this version of 6b to the same level as JT with different data.

Also to be clear, I'm not saying JT is bad at all, t's just not good for this usecase.

Also to be clear, I'm not saying JT is bad at all, t's just not good for this usecase.

Yes, she seems to classify better. Perhaps in text generation she is not so good. But in fact, I'm still only studying neural networks, so I can be wrong

It preforms far worse in logical and moral puzzles, including most cognition tests I've preformed. As for anything scientific or practical I would choose JT but for a really good chat model that has amazing reasoning I'd stay where we are on pygmalion and focus on improving that.

Pygmalion org

I've got an eye on the fine-tuning methods used by JT but:

  • Code has not been released and I don't have time to try re-implementing; and

  • According to the author(s?), we're probably better off just focusing on getting more/better data instead:

    the UL2 training objective also contributes to the overall performance, although it should be noted that the improvement from this is relatively small (~1%). To summarize, adding more data is often one of the most effective ways to improve a specific task. However, once you hit the wall, UL2 can also be considered as a potential approach to further improve the performance.

That being the case, I have no plans to use anything from JT at the moment.

11b changed discussion status to closed

Sign up or log in to comment