How did you manage to train this on a T4 alone?

#2
by brianhuynhML - opened

I was shocked to find that you had managed to train all of this on a single T4 GPU. I made an attempt on fine tuning on Colab before, with the T4; but the process took 7 hours and I only had 2 hours and 30 minutes left of compute. How did your fine tuning manage to go through 22 days without all of your fine tuning progress deleted? I am interested.

AGI-0 Labs org

I created it 22 days ago, it didn't take 22 days to train. It took around 30 minutes on colab.

Just tested the model on Spaces. I can say that the output is much more clear and easier to understand than the base Llama 3.1 8B. Here are some screenshots:
Your model:
Screenshot 2024-09-09 at 18.45.23.png
Screenshot 2024-09-09 at 18.45.29.png
Screenshot 2024-09-09 at 18.45.33.png
Base
Screenshot 2024-09-09 at 18.46.59.png
Screenshot 2024-09-09 at 18.47.03.png
Screenshot 2024-09-09 at 18.47.08.png

As you can see, the base model just spits out the SQL schema, without explain the logic and functionality behind it, whereas Artificium explains how each field works.
Note: I chatted with the base and Artificium for a while and I found that Artificium is more step-by-step based than the base model. Maybe it is different for others, but for me it is this way.

Sign up or log in to comment