iampanda commited on
Commit
7c9c9c2
1 Parent(s): 58e695a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -1081,7 +1081,6 @@ library_name: sentence-transformers
1081
  - Finally, total amount of synthesized data is about 30 million.
1082
 
1083
  3) **Collect more data for retrieval-type tasks**
1084
- - ***We constructed a dataset of approximately 100 million training samples through collection, machine translation, and LLM synthesis. This dataset includes data from various fields such as healthcare, law, electricity, automotive, and 3C (Consumer Electronics).***
1085
  - [miracl/miracl](https://huggingface.co/datasets/miracl/miracl)
1086
  - [FreedomIntelligence/Huatuo26M-Lite](https://huggingface.co/datasets/FreedomIntelligence/Huatuo26M-Lite)
1087
  - [PaddlePaddle/dureader_robust](https://huggingface.co/datasets/PaddlePaddle/dureader_robust) **C-MTEB test filtered**
@@ -1090,6 +1089,9 @@ library_name: sentence-transformers
1090
  - [Shitao/MLDR](https://huggingface.co/datasets/Shitao/MLDR)
1091
  - ...
1092
 
 
 
 
1093
  **Training loss**
1094
  1) Multi-Task loss like [Piccolo](https://huggingface.co/sensenova/piccolo-large-zh-v2)
1095
  2) Matryoshka Representation Learning
 
1081
  - Finally, total amount of synthesized data is about 30 million.
1082
 
1083
  3) **Collect more data for retrieval-type tasks**
 
1084
  - [miracl/miracl](https://huggingface.co/datasets/miracl/miracl)
1085
  - [FreedomIntelligence/Huatuo26M-Lite](https://huggingface.co/datasets/FreedomIntelligence/Huatuo26M-Lite)
1086
  - [PaddlePaddle/dureader_robust](https://huggingface.co/datasets/PaddlePaddle/dureader_robust) **C-MTEB test filtered**
 
1089
  - [Shitao/MLDR](https://huggingface.co/datasets/Shitao/MLDR)
1090
  - ...
1091
 
1092
+ ***We constructed a dataset of approximately 100 million training samples through collection, machine translation, and LLM synthesis. This dataset includes data from various fields such as healthcare, law, electricity, automotive, and 3C (Consumer Electronics).***
1093
+
1094
+
1095
  **Training loss**
1096
  1) Multi-Task loss like [Piccolo](https://huggingface.co/sensenova/piccolo-large-zh-v2)
1097
  2) Matryoshka Representation Learning