AliMaatouk commited on
Commit
0fa588a
1 Parent(s): e90e8d4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +88 -3
README.md CHANGED
@@ -1,3 +1,88 @@
1
- ---
2
- license: gemma
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: gemma
3
+ language:
4
+ - en
5
+ pipeline_tag: text-generation
6
+ tags:
7
+ - nlp
8
+ ---
9
+
10
+ # Gemma-2B-Tele Model Card
11
+
12
+ ## Model Summary
13
+
14
+ The language model Gemma-2B-Tele is a Transformer with **2 billion** parameters, specialized in telecommunications. It is based on Google [gemma-2b](https://huggingface.co/google/gemma-2b) and was continutally pretrained on [Tele-Data](https://huggingface.co/datasets/AliMaatouk/Tele-Data), a large-scale dataset of approximately 2.5 billion tokens of telecommunications material, including articles, standards, and general web content related to the telecommunications domain.
15
+
16
+ When assessed against telecommunications benchmarks such as [Tele-Eval](https://huggingface.co/datasets/AliMaatouk/Tele-Eval), Gemma-2B-Tele outperforms [gemma-2b](https://huggingface.co/google/gemma-2b) by several percentage points. Additionally, Gemma-2B-Tele matches [gemma-2b](https://huggingface.co/google/gemma-2b) across benchmarks related to common sense, language understanding, and logical reasoning. Thus, this adaptation was achieved with minimal compromise in performance on the original version.
17
+
18
+ ### Context Length
19
+
20
+ The model was trained on a context length of 8192 tokens.
21
+
22
+ ## Usage
23
+
24
+ Gemma-2B-Tele is a base model best suited for fine-tuning on applications related to telecommunications. It has not been fine-tuned to follow instructions and operates solely within a text completion framework. An example of this completion can be found below:
25
+
26
+ ```markdown
27
+ Prompt: Shannon capacity is
28
+
29
+ Model: the maximum rate at which information can be reliably transmitted over a communication channel. It is named after Claude Shannon, who introduced the concept in his 1948 paper "A Mathematical Theory of Communication".
30
+ ```
31
+
32
+ The instruct version of this model can be found by following the link [Gemma-2B-Tele-it](https://huggingface.co/AliMaatouk/Gemma-2B-Tele-it).
33
+
34
+ ## Sample Code
35
+
36
+ Below we share some code snippets on how to get quickly started with running the model. First, make sure to `pip install transformers`, then copy the snippet corresponding to your hardware and adapt it to your usecase.
37
+
38
+ #### Running the model on a CPU
39
+
40
+
41
+ ```python
42
+ from transformers import AutoTokenizer, AutoModelForCausalLM
43
+
44
+ model = AutoModelForCausalLM.from_pretrained("AliMaatouk/Gemma-2B-Tele", torch_dtype="auto")
45
+ tokenizer = AutoTokenizer.from_pretrained("AliMaatouk/Gemma-2B-Tele")
46
+
47
+ prompt = "Shannon capacity is"
48
+ input_ids = tokenizer(prompt, return_tensors="pt")
49
+ outputs = model.generate(**input_ids, max_new_tokens=100)
50
+
51
+ generated_tokens = outputs[0, len(input_ids['input_ids'][0]):]
52
+ response = tokenizer.decode(generated_tokens, skip_special_tokens=True)
53
+ print(response)
54
+ ```
55
+
56
+ #### Running the model on a single / multi GPU
57
+
58
+ ```python
59
+ import torch
60
+ from transformers import AutoModelForCausalLM, AutoTokenizer
61
+
62
+ model = AutoModelForCausalLM.from_pretrained("AliMaatouk/Gemma-2B-Tele", torch_dtype="auto", device_map="auto")
63
+ tokenizer = AutoTokenizer.from_pretrained("AliMaatouk/Gemma-2B-Tele")
64
+
65
+ prompt = "Shannon capacity is"
66
+ input_ids = tokenizer(prompt, return_tensors="pt").to("cuda")
67
+ outputs = model.generate(**input_ids, max_new_tokens=100)
68
+
69
+ generated_tokens = outputs[0, len(input_ids['input_ids'][0]):]
70
+ response = tokenizer.decode(generated_tokens, skip_special_tokens=True)
71
+ print(response)
72
+ ```
73
+
74
+ ## Citation
75
+
76
+ You can find the paper with all details about the model at https://arxiv.org/abs/2409.05314. Please cite it as follows:
77
+
78
+ ```bib
79
+ @misc{maatouk2024telellmsseriesspecializedlarge,
80
+ title={Tele-LLMs: A Series of Specialized Large Language Models for Telecommunications},
81
+ author={Ali Maatouk and Kenny Chirino Ampudia and Rex Ying and Leandros Tassiulas},
82
+ year={2024},
83
+ eprint={2409.05314},
84
+ archivePrefix={arXiv},
85
+ primaryClass={cs.IT},
86
+ url={https://arxiv.org/abs/2409.05314},
87
+ }
88
+ ```