abokbot commited on
Commit
4cc5620
1 Parent(s): f18933f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -2
README.md CHANGED
@@ -9,7 +9,7 @@ tags:
9
  - MSMARCO
10
  ---
11
  # Description
12
- We use MS Marco Encoder msmarco-MiniLM-L-6-v3 to encode the text from dataset [abokbot/wikipedia-first-paragraph](https://huggingface.co/datasets/abokbot/wikipedia-first-paragraph).
13
 
14
  The dataset contains the first paragraphs of the English "20220301.en" version of the [Wikipedia dataset](https://huggingface.co/datasets/wikipedia).
15
 
@@ -28,4 +28,7 @@ bi_encoder.max_seq_length = 256
28
  wikipedia_embedding = bi_encoder.encode(dataset["text"], convert_to_tensor=True, show_progress_bar=True)
29
 
30
  ```
31
- This operation took 35min on a Google Colab notebook with GPU.
 
 
 
 
9
  - MSMARCO
10
  ---
11
  # Description
12
+ We use MS Marco Encoder msmarco-MiniLM-L-6-v3 from the sentence-transformers library to encode the text from dataset [abokbot/wikipedia-first-paragraph](https://huggingface.co/datasets/abokbot/wikipedia-first-paragraph).
13
 
14
  The dataset contains the first paragraphs of the English "20220301.en" version of the [Wikipedia dataset](https://huggingface.co/datasets/wikipedia).
15
 
 
28
  wikipedia_embedding = bi_encoder.encode(dataset["text"], convert_to_tensor=True, show_progress_bar=True)
29
 
30
  ```
31
+ This operation took 35min on a Google Colab notebook with GPU.
32
+
33
+ # Reference
34
+ More information of MS Marco encoders here https://www.sbert.net/docs/pretrained-models/ce-msmarco.html