abokbot
/

wikipedia-embedding

sentence-transformers

Model card Files Files and versions Community

wikipedia-embedding / README.md

abokbot's picture

Create README.md

c786fa0 over 1 year ago

|

959 Bytes

	---
	datasets:
	- abokbot/wikipedia-first-paragraph
	language:
	- en
	library_name: sentence-transformers
	tags:
	- bi-coder
	- MSMARCO
	---
	# Description
	We use MS Marco Encoder msmarco-MiniLM-L-6-v3 to encode the text from dataset [abokbot/wikipedia-first-paragraph](https://huggingface.co/datasets/abokbot/wikipedia-first-paragraph).

	This dataset contains the first paragraphs of the English "20220301.en" version of the [Wikipedia dataset](https://huggingface.co/datasets/wikipedia).


	# Code
	It was obtained by running the following code.

	```python
	from datasets import load_dataset
	from sentence_transformers import SentenceTransformer

	dataset = load_dataset("abokbot/wikipedia-first-paragraph")
	bi_encoder = SentenceTransformer('msmarco-MiniLM-L-6-v3')
	bi_encoder.max_seq_length = 256
	wikipedia_embedding = bi_encoder.encode(dataset["text"], convert_to_tensor=True, show_progress_bar=True)

	```
	This operation took 35min on a Google Colab notebook with GPU.