prithivida commited on
Commit
2a2e10d
1 Parent(s): 2deee4b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -3
README.md CHANGED
@@ -52,7 +52,7 @@ pipeline_tag: sentence-similarity
52
  - [How can I reduce overall inference cost ?](#how-can-i-reduce-overall-inference-cost)
53
  - [How do I reduce vector storage cost?](#how-do-i-reduce-vector-storage-cost)
54
  - [How do I offer hybrid search to improve accuracy?](#how-do-i-offer-hybrid-search-to-improve-accuracy)
55
- - [Why not run MTEB?](#why-not-run-mteb)
56
  - [Roadmap](#roadmap)
57
  - [Notes on Reproducing:](#notes-on-reproducing)
58
  - [Reference:](#reference)
@@ -165,9 +165,23 @@ The below numbers are with mDPR model, but miniMiracle_zh_v1 should give a even
165
 
166
  *Note: MIRACL paper shows a different (higher) value for BM25 Chinese, So we are taking that value from BGE-M3 paper, rest all are form the MIRACL paper.*
167
 
168
- #### Why not run cMTEB?
169
  CMTEB is a general purpose embedding evaluation bechmark covering wide range of tasks, but like BGE-M3, miniMiracle models are predominantly tuned for retireval tasks aimed at search & IR based usecases.
170
- But we would run the retrieval slice of the cMTEB and add the scores here.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
171
 
172
 
173
  # Roadmap
 
52
  - [How can I reduce overall inference cost ?](#how-can-i-reduce-overall-inference-cost)
53
  - [How do I reduce vector storage cost?](#how-do-i-reduce-vector-storage-cost)
54
  - [How do I offer hybrid search to improve accuracy?](#how-do-i-offer-hybrid-search-to-improve-accuracy)
55
+ - [CMTEB numbers](#cmteb-numbers)
56
  - [Roadmap](#roadmap)
57
  - [Notes on Reproducing:](#notes-on-reproducing)
58
  - [Reference:](#reference)
 
165
 
166
  *Note: MIRACL paper shows a different (higher) value for BM25 Chinese, So we are taking that value from BGE-M3 paper, rest all are form the MIRACL paper.*
167
 
168
+ #### cMTEB numbers:
169
  CMTEB is a general purpose embedding evaluation bechmark covering wide range of tasks, but like BGE-M3, miniMiracle models are predominantly tuned for retireval tasks aimed at search & IR based usecases.
170
+ We ran the retrieval slice of the cMTEB and add the scores here.
171
+
172
+ We compared the performance few top general purpose embedding models on the C-MTEB benchmark. please refer to the C-MTEB leaderboard.
173
+
174
+
175
+ | Model Name | Model Size (GB) | Dimension | Sequence Length | Retrieval (8) | Remarks|
176
+ |:----:|:---:|:---:|:---:|:---:|
177
+ | [360Zhinao-search] | 0.61 (FP16)| 1024 | 512 | 75.06 | Top Model as on Jun 2024 |
178
+ | [piccolo-large-zh] | 0.65 (FP16)| 1024 | 512 | 70.93 ||
179
+ | [bge-large-zh]| 1.3 | 1024| 512 | 70.46||
180
+ | [piccolo-base-zh]| 0.2 (FP16)| 768 | 512 | 71.2 ||
181
+ | [bge-large-zh-no-instruct]| 1.3 | 1024 | 512 | 70.54 ||
182
+ | [bge-base-zh]| 0.41 | 768 | 512 | 69.3 ||
183
+ | [**miniMiracle_zh_v1**]| **0.47** | **384** | **512** | **59.91** ||
184
+
185
 
186
 
187
  # Roadmap