intfloat/e5-mistral-7b-instruct · Best Practices for Fine-Tuning Models on Multi-Hop Datasets?

Hello, for my research I’m planning to fine-tune the model using the HoVer dataset, which includes queries that can involve up to 4 documents for verification. I have a question about setting up the training data for queries with multiple hops.

Should each query with 'n' hops include the given 'n' ground truth documents as positive examples and also 'n' negative examples for each of these queries? I'm interested in understanding the optimal way to structure my training data to improve the model's performance on multi-hop queries.