Update README.md
Browse files
README.md
CHANGED
@@ -9,7 +9,7 @@ tags:
|
|
9 |
- MSMARCO
|
10 |
---
|
11 |
# Description
|
12 |
-
We use MS Marco Encoder msmarco-MiniLM-L-6-v3 to encode the text from dataset [abokbot/wikipedia-first-paragraph](https://huggingface.co/datasets/abokbot/wikipedia-first-paragraph).
|
13 |
|
14 |
The dataset contains the first paragraphs of the English "20220301.en" version of the [Wikipedia dataset](https://huggingface.co/datasets/wikipedia).
|
15 |
|
@@ -28,4 +28,7 @@ bi_encoder.max_seq_length = 256
|
|
28 |
wikipedia_embedding = bi_encoder.encode(dataset["text"], convert_to_tensor=True, show_progress_bar=True)
|
29 |
|
30 |
```
|
31 |
-
This operation took 35min on a Google Colab notebook with GPU.
|
|
|
|
|
|
|
|
9 |
- MSMARCO
|
10 |
---
|
11 |
# Description
|
12 |
+
We use MS Marco Encoder msmarco-MiniLM-L-6-v3 from the sentence-transformers library to encode the text from dataset [abokbot/wikipedia-first-paragraph](https://huggingface.co/datasets/abokbot/wikipedia-first-paragraph).
|
13 |
|
14 |
The dataset contains the first paragraphs of the English "20220301.en" version of the [Wikipedia dataset](https://huggingface.co/datasets/wikipedia).
|
15 |
|
|
|
28 |
wikipedia_embedding = bi_encoder.encode(dataset["text"], convert_to_tensor=True, show_progress_bar=True)
|
29 |
|
30 |
```
|
31 |
+
This operation took 35min on a Google Colab notebook with GPU.
|
32 |
+
|
33 |
+
# Reference
|
34 |
+
More information of MS Marco encoders here https://www.sbert.net/docs/pretrained-models/ce-msmarco.html
|