Quantization made by Richard Erkhov.

TinyLlama-1.1B-Tele - GGUF

Model creator: https://huggingface.co/AliMaatouk/
Original model: https://huggingface.co/AliMaatouk/TinyLlama-1.1B-Tele/

Name	Quant method	Size
TinyLlama-1.1B-Tele.Q2_K.gguf	Q2_K	0.4GB
TinyLlama-1.1B-Tele.IQ3_XS.gguf	IQ3_XS	0.44GB
TinyLlama-1.1B-Tele.IQ3_S.gguf	IQ3_S	0.47GB
TinyLlama-1.1B-Tele.Q3_K_S.gguf	Q3_K_S	0.47GB
TinyLlama-1.1B-Tele.IQ3_M.gguf	IQ3_M	0.48GB
TinyLlama-1.1B-Tele.Q3_K.gguf	Q3_K	0.51GB
TinyLlama-1.1B-Tele.Q3_K_M.gguf	Q3_K_M	0.51GB
TinyLlama-1.1B-Tele.Q3_K_L.gguf	Q3_K_L	0.55GB
TinyLlama-1.1B-Tele.IQ4_XS.gguf	IQ4_XS	0.57GB
TinyLlama-1.1B-Tele.Q4_0.gguf	Q4_0	0.59GB
TinyLlama-1.1B-Tele.IQ4_NL.gguf	IQ4_NL	0.6GB
TinyLlama-1.1B-Tele.Q4_K_S.gguf	Q4_K_S	0.6GB
TinyLlama-1.1B-Tele.Q4_K.gguf	Q4_K	0.62GB
TinyLlama-1.1B-Tele.Q4_K_M.gguf	Q4_K_M	0.62GB
TinyLlama-1.1B-Tele.Q4_1.gguf	Q4_1	0.65GB
TinyLlama-1.1B-Tele.Q5_0.gguf	Q5_0	0.71GB
TinyLlama-1.1B-Tele.Q5_K_S.gguf	Q5_K_S	0.71GB
TinyLlama-1.1B-Tele.Q5_K.gguf	Q5_K	0.73GB
TinyLlama-1.1B-Tele.Q5_K_M.gguf	Q5_K_M	0.73GB
TinyLlama-1.1B-Tele.Q5_1.gguf	Q5_1	0.77GB
TinyLlama-1.1B-Tele.Q6_K.gguf	Q6_K	0.84GB
TinyLlama-1.1B-Tele.Q8_0.gguf	Q8_0	1.09GB

Original model description:

license: apache-2.0 language: - en pipeline_tag: text-generation tags: - nlp

TinyLlama-1.1B-Tele Model Card

Model Summary

The language model TinyLlama-1.1B-Tele is a Transformer with 1.1 billion parameters, specialized in telecommunications. It is based on TinyLlama-1.1B and was continutally pretrained on Tele-Data, a large-scale dataset of approximately 2.5 billion tokens of telecommunications material, including articles, standards, and general web content related to the telecommunications domain.

When assessed against telecommunications benchmarks such as Tele-Eval, TinyLlama-1.1B-Tele outperforms TinyLlama-1.1B by several percentage points. Additionally, TinyLlama-1.1B-Tele matches TinyLlama-1.1B across benchmarks related to common sense, language understanding, and logical reasoning. Thus, this adaptation was achieved with minimal compromise in performance on the original version.

Context Length

The model was trained on a context length of 2048 tokens.

Usage

TinyLlama-1.1B-Tele is a base model best suited for fine-tuning on applications related to telecommunications. It has not been fine-tuned to follow instructions and operates solely within a text completion framework. An example of this completion can be found below:

Prompt: Shannon capacity is

Model: the capacity of a noiseless communication channel with a memoryless source. The Shannon capacity is a measure of the information rate that can be reliably transmitted over a noiseless channel.

The instruct version of this model can be found by following the link TinyLlama-1.1B-Tele-it.

Sample Code

Below we share some code snippets on how to get quickly started with running the model. First, make sure to pip install transformers, then copy the snippet corresponding to your hardware and adapt it to your usecase.

Running the model on a CPU

from transformers import AutoTokenizer, AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("AliMaatouk/TinyLlama-1.1B-Tele", torch_dtype="auto")
tokenizer = AutoTokenizer.from_pretrained("AliMaatouk/TinyLlama-1.1B-Tele")

prompt = "Shannon capacity is"
input_ids = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**input_ids, max_new_tokens=100)

generated_tokens = outputs[0, len(input_ids['input_ids'][0]):]
response = tokenizer.decode(generated_tokens, skip_special_tokens=True)
print(response)

Running the model on a single / multi GPU

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("AliMaatouk/TinyLlama-1.1B-Tele", torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("AliMaatouk/TinyLlama-1.1B-Tele")

prompt = "Shannon capacity is"
input_ids = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=100)

generated_tokens = outputs[0, len(input_ids['input_ids'][0]):]
response = tokenizer.decode(generated_tokens, skip_special_tokens=True)
print(response)

Citation

You can find the paper with all details about the model at https://arxiv.org/abs/2409.05314. Please cite it as follows:

@misc{maatouk2024telellmsseriesspecializedlarge,
      title={Tele-LLMs: A Series of Specialized Large Language Models for Telecommunications}, 
      author={Ali Maatouk and Kenny Chirino Ampudia and Rex Ying and Leandros Tassiulas},
      year={2024},
      eprint={2409.05314},
      archivePrefix={arXiv},
      primaryClass={cs.IT},
      url={https://arxiv.org/abs/2409.05314}, 
}