How did you train m3-retromae?
Hello bge-m3 Great approach! It performs well and I love it. Thank you for the great models and papers published.
I would like to know about the 8192 tokens support of XLM-Roberta, as I could not read it from the paper.
Is it correct that you first set the max_position_embeddings of XLM-Roberta to 8194 and then created a bge-m3-retromae trained with long token sentences in RetroMAE?
I would also appreciate if you could tell me what training dataset you used at that time, if possible.
Thanks for your attention to our work!
We extend the max_position_embeddings of XLM-Roberta to 8194 and train this model on pile, mc4, and wudao datasets with retromae loss.
For the details of pre-training, you can refer to Appendix.B.1 in our paper.
Thank you!
I have also read Appendix.B.1, which deepened my understanding. I'm very grateful.