Gemma-2B-Tele-it / README.md
AliMaatouk's picture
Update README.md
7e40ec4 verified
|
raw
history blame
3.51 kB
metadata
license: gemma
language:
  - en
pipeline_tag: text-generation
tags:
  - nlp

Gemma-2B-Tele-it Model Card

Model Summary

The language model Gemma-2B-Tele-it is an instruct version of Gemma-2B-Tele, which is based on Google gemma-2b and specialized in telecommunications. It was fine-tuned to follow instructions using Supervised Fine-tuning (SFT) with a combination of the Alpaca and Open-instruct datasets.

Context Length

The context length of the model is 8192 tokens.

Usage

Gemma-2B-Tele-it has been fine-tuned using pairs of instructions and responses from the Alpaca and Open-instruct datasets, separated by the "\n" delimiter. Below is an example of how to query the model using this format:

Prompt: Explain to me Shannon capacity.\n

Model: Shannon capacity is a measure of the maximum achievable rate of reliable data transmission that can occur over a noisy channel, named after C. E. Shannon. It is also commonly known as channel capacity in information theory, and it is the largest amount of information that a channel can transmit reliably per unit of time. It is calculated by considering the noise and interference that a transmission may face.

Sample Code

Below we share some code snippets on how to get quickly started with running the model. First, make sure to pip install transformers, then copy the snippet corresponding to your hardware and adapt it to your usecase.

Running the model on a CPU

from transformers import AutoTokenizer, AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("AliMaatouk/Gemma-2B-Tele-it", torch_dtype="auto")
tokenizer = AutoTokenizer.from_pretrained("AliMaatouk/Gemma-2B-Tele-it")

prompt = "Explain to me Shannon capacity.\n"
input_ids = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**input_ids, max_new_tokens=100)

generated_tokens = outputs[0, len(input_ids['input_ids'][0]):]
response = tokenizer.decode(generated_tokens, skip_special_tokens=True)
print(response)

Running the model on a single / multi GPU

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("AliMaatouk/Gemma-2B-Tele-it", torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("AliMaatouk/Gemma-2B-Tele-it")

prompt = "Explain to me Shannon capacity.\n"
input_ids = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=100)

generated_tokens = outputs[0, len(input_ids['input_ids'][0]):]
response = tokenizer.decode(generated_tokens, skip_special_tokens=True)
print(response)

Citation

You can find the paper with all details about the model at https://arxiv.org/abs/2409.05314. Please cite it as follows:

@misc{maatouk2024telellmsseriesspecializedlarge,
      title={Tele-LLMs: A Series of Specialized Large Language Models for Telecommunications}, 
      author={Ali Maatouk and Kenny Chirino Ampudia and Rex Ying and Leandros Tassiulas},
      year={2024},
      eprint={2409.05314},
      archivePrefix={arXiv},
      primaryClass={cs.IT},
      url={https://arxiv.org/abs/2409.05314}, 
}