abokbot commited on
Commit
c786fa0
1 Parent(s): 4fb9ec2

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -0
README.md ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - abokbot/wikipedia-first-paragraph
4
+ language:
5
+ - en
6
+ library_name: sentence-transformers
7
+ tags:
8
+ - bi-coder
9
+ - MSMARCO
10
+ ---
11
+ # Description
12
+ We use MS Marco Encoder msmarco-MiniLM-L-6-v3 to encode the text from dataset [abokbot/wikipedia-first-paragraph](https://huggingface.co/datasets/abokbot/wikipedia-first-paragraph).
13
+
14
+ This dataset contains the first paragraphs of the English "20220301.en" version of the [Wikipedia dataset](https://huggingface.co/datasets/wikipedia).
15
+
16
+
17
+ # Code
18
+ It was obtained by running the following code.
19
+
20
+ ```python
21
+ from datasets import load_dataset
22
+ from sentence_transformers import SentenceTransformer
23
+
24
+ dataset = load_dataset("abokbot/wikipedia-first-paragraph")
25
+ bi_encoder = SentenceTransformer('msmarco-MiniLM-L-6-v3')
26
+ bi_encoder.max_seq_length = 256
27
+ wikipedia_embedding = bi_encoder.encode(dataset["text"], convert_to_tensor=True, show_progress_bar=True)
28
+
29
+ ```
30
+ This operation took 35min on a Google Colab notebook with GPU.