How did you train m3-retromae?

#66

by hotchpotch - opened 14 days ago

14 days ago

Hello bge-m3 Great approach! It performs well and I love it. Thank you for the great models and papers published.

I would like to know about the 8192 tokens support of XLM-Roberta, as I could not read it from the paper.
Is it correct that you first set the max_position_embeddings of XLM-Roberta to 8194 and then created a bge-m3-retromae trained with long token sentences in RetroMAE?
I would also appreciate if you could tell me what training dataset you used at that time, if possible.

Shitao

Beijing Academy of Artificial Intelligence org 14 days ago

Thanks for your attention to our work!
We extend the max_position_embeddings of XLM-Roberta to 8194 and train this model on pile, mc4, and wudao datasets with retromae loss.
For the details of pre-training, you can refer to Appendix.B.1 in our paper.

hotchpotch

14 days ago

Thank you!

I have also read Appendix.B.1, which deepened my understanding. I'm very grateful.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment