Phi-1.5-Tele / README.md

Upload README.md

539e8e7 verified about 1 month ago

3.71 kB

	---
	license: mit
	language:
	- en
	pipeline_tag: text-generation
	tags:
	- nlp
	---

	# Phi-1.5-Tele Model Card

	## Model Summary

	The language model Phi-1.5-Tele is a Transformer with 1.3 billion parameters, specialized in telecommunications. It is based on Microsoft [phi-1.5](https://huggingface.co/microsoft/phi-1_5) and was continutally pretrained on [Tele-Data](https://huggingface.co/datasets/AliMaatouk/Tele-Data), a large-scale dataset of approximately 2.5 billion tokens of telecommunications material, including articles, standards, and general web content related to the telecommunications domain.

	When assessed against telecommunications benchmarks such as [Tele-Eval](https://huggingface.co/datasets/AliMaatouk/Tele-Eval), Phi-1.5-Tele outperforms [phi-1.5](https://huggingface.co/microsoft/phi-1_5) by several percentage points. Additionally, Phi-1.5-Tele matches [phi-1.5](https://huggingface.co/microsoft/phi-1_5) across benchmarks related to common sense, language understanding, and logical reasoning. Thus, this adaptation was achieved with minimal compromise in performance on the original version.

	### Context Length

	The model was trained on a context length of 2048 tokens.

	## Usage

	Phi-1.5-Tele is a base model best suited for fine-tuning on applications related to telecommunications. Although it has not been specifically fine-tuned to follow instructions, it can be prompted to answer questions and follow instructions using the following format:

	```markdown
	Write me a poem about telecommunications.

	Answer: This world so vast and wide, we send our thoughts fast,
	With technology that allows us to be ever part of it.
	We connect, we share, we unite,
	Through the web of information, so vast and complete.
	```

	where the model generates the text after "Answer:".

	## Sample Code

	Below we share some code snippets on how to get quickly started with running the model. First, make sure to `pip install -U transformers`, then copy the snippet corresponding to your hardware and adapt it to your usecase.

	#### Running the model on a CPU


	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM

	model = AutoModelForCausalLM.from_pretrained("AliMaatouk/Phi-1.5-Tele", torch_dtype="auto")
	tokenizer = AutoTokenizer.from_pretrained("AliMaatouk/Phi-1.5-Tele")

	prompt = "Write me a poem about telecommunications.\nAnswer:"
	input_ids = tokenizer(prompt, return_tensors="pt")
	outputs = model.generate(**input_ids, max_new_tokens=100)

	generated_tokens = outputs[0, len(input_ids['input_ids'][0]):]
	response = tokenizer.decode(generated_tokens, skip_special_tokens=True)
	print(response)
	```

	#### Running the model on a single / multi GPU

	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model = AutoModelForCausalLM.from_pretrained("AliMaatouk/Phi-1.5-Tele", torch_dtype="auto", device_map="auto")
	tokenizer = AutoTokenizer.from_pretrained("AliMaatouk/Phi-1.5-Tele")

	prompt = "Write me a poem about telecommunications.\nAnswer:"
	input_ids = tokenizer(prompt, return_tensors="pt").to("cuda")
	outputs = model.generate(**input_ids, max_new_tokens=100)

	generated_tokens = outputs[0, len(input_ids['input_ids'][0]):]
	response = tokenizer.decode(generated_tokens, skip_special_tokens=True)
	print(response)
	```

	## Citation

	You can find the paper with all details about the model at https://arxiv.org/abs/2309.05463. Please cite it as follows:

	```bib
	@article{textbooks2,
	title={Textbooks Are All You Need II: \textbf{phi-1.5} technical report},
	author={Li, Yuanzhi and Bubeck, S{\'e}bastien and Eldan, Ronen and Del Giorno, Allie and Gunasekar, Suriya and Lee, Yin Tat},
	journal={arXiv preprint arXiv:2309.05463},
	year={2023}
	}
	```