abokbot
/

wikipedia-embedding

sentence-transformers

Model card Files Files and versions Community

abokbot commited on Jun 4, 2023

Commit

4cc5620

•

1 Parent(s): f18933f

Update README.md

Files changed (1) hide show

README.md +5 -2

README.md CHANGED Viewed

@@ -9,7 +9,7 @@ tags:
 - MSMARCO
 ---
 # Description
-We use MS Marco Encoder msmarco-MiniLM-L-6-v3 to encode the text from dataset [abokbot/wikipedia-first-paragraph](https://huggingface.co/datasets/abokbot/wikipedia-first-paragraph).
 The dataset contains the first paragraphs of the English "20220301.en" version of the [Wikipedia dataset](https://huggingface.co/datasets/wikipedia).
@@ -28,4 +28,7 @@ bi_encoder.max_seq_length = 256
 wikipedia_embedding = bi_encoder.encode(dataset["text"], convert_to_tensor=True, show_progress_bar=True)
 ```
-This operation took 35min on a Google Colab notebook with GPU.

 - MSMARCO
 ---
 # Description
+We use MS Marco Encoder msmarco-MiniLM-L-6-v3 from the sentence-transformers library to encode the text from dataset [abokbot/wikipedia-first-paragraph](https://huggingface.co/datasets/abokbot/wikipedia-first-paragraph).
 The dataset contains the first paragraphs of the English "20220301.en" version of the [Wikipedia dataset](https://huggingface.co/datasets/wikipedia).
 wikipedia_embedding = bi_encoder.encode(dataset["text"], convert_to_tensor=True, show_progress_bar=True)
 ```
+This operation took 35min on a Google Colab notebook with GPU.
+# Reference
+More information of MS Marco encoders here https://www.sbert.net/docs/pretrained-models/ce-msmarco.html