---
base_model: unsloth/Meta-Llama-3.1-8B-bnb-4bit
library_name: peft
license: apache-2.0
datasets:
- Respair/sharegpt_chatml_compressed
- diwank/llmlingua-compressed-text
- AlexMaclean/wikipedia-deletion-compressions
- AlexMaclean/all-deletion-compressions
- sentence-transformers/sentence-compression
language:
- en
tags:
- compression
- pytorch
- facebook
- meta
- llama
- llama-3
pipeline_tag: text-generation
---

# Model Card for Model ID

Memories - Token Compressor for Long-Range Dependency Conversations


## Model Details

### Model Description

This model is a fine-tuned version of the Llama 3.1 8B 4-bit model, specifically trained for token compression tasks. It uses LoRA (Low-Rank Adaptation) for efficient fine-tuning while maintaining the base model's performance.

- **Developed by:** Alosh Denny
- **Funded by:** EmelinLabs
- **Shared by** EmelinLabs
- **Model type:** Token Compressor for Memories
- **Language(s) (NLP):** English
- **License:** apache-2.0

## Uses

### Direct Use

This model is designed for token compression tasks. It can be used to generate more concise versions of input text while preserving the essential meaning.

### Downstream Use

The compressed outputs from this model can be used in various NLP applications where text length is a constraint, such as summarization, efficient text storage, or as input for other language models with token limits.

### Out-of-Scope Use

This model should not be used for tasks that require full preservation of the original text or where nuanced details are critical. It's not suitable for legal, medical, or other domains where precise wording is essential.

## Bias, Risks, and Limitations

- The model may inadvertently remove important context or nuance during compression.
- There might be biases inherited from the base Llama 3.1 model or introduced during fine-tuning.
- The model's performance may vary depending on the input text's domain or complexity.

### Recommendations

- Users should review the compressed outputs for accuracy and appropriateness before use in critical applications.
- It's advisable to test the model on a diverse range of inputs to understand its performance across different text types and domains.

## How to Get Started with the Model

Use the code below to get started with the model.

```python
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM

config = PeftConfig.from_pretrained("aoxo/llama-token-compressor")
base_model = AutoModelForCausalLM.from_pretrained("unsloth/Meta-Llama-3.1-8B-bnb-4bit")
model = PeftModel.from_pretrained(base_model, "aoxo/llama-token-compressor")
```

## Training Details

### Training Data

The model was trained on a dataset compiled from various sources, including:
- Respair/sharegpt_chatml_compressed
- diwank/llmlingua-compressed-text
- AlexMaclean/wikipedia-deletion-compressions
- AlexMaclean/all-deletion-compressions
- sentence-transformers/sentence-compression

### Training Procedure


#### Preprocessing

Prompt-response pairs were processed from the datasets and compiled into a single dataset (available at https://huggingface.co/datasets/aoxo/token_compressor). Unwanted characters, trailing whitespaces and inverted commas were voided.

#### Training Hyperparameters

- **Training regime:** bf16 mixed precision
- **Optimizer:** paged_adamw_8bit
- **Learning rate:** 2e-4
- **LR scheduler:** cosine
- **Batch size:** 4 per device
- **Gradient accumulation steps:** 16
- **Number of epochs:** 10
- **Max steps:** 175,118

#### LoRA Configuration

- **r:** 8
- **lora_alpha:** 16
- **lora_dropout:** 0.05
- **bias:** none
- **task_type:** CAUSAL_LM

#### Speeds, Sizes, Times

- **Total Training Compute Throughput:** 8.62 PFLOPS
- **Total Logged Training Time:** 1422.31 hours
- **Start Time:** 07-21-2024 02:02:32
- **End Time:** 09-18-2024 08:21:08
- **Checkpoint Size (Adapter):** 13,648,432 bytes

### Evaluation Data, Factors & Results

## Evaluation

- **Total Evaluation Compute Throughput:** 14.34 GFLOPS
- **Total Logged Evaluation Time:** 34.25 minutes
- **Start Time:** 09-18-2024 08:23:11
- **End Time:** 09-18-2024 08:57:26

#### Evaluation Data

Evaluation was performed on a subset of the following dataset:

- sentence-transformers/sentence-compression

### Results

To demonstrate the model's performance, we've tested it on prompts of varying lengths. The results show how the model compresses texts of different sizes while maintaining the core meaning.

#### Example 1: Very Large Paragraph

**Input:**
The impact of artificial intelligence on modern society is a topic of intense debate and speculation. As AI technologies continue to advance at an unprecedented pace, they are reshaping industries, transforming job markets, and altering the way we interact with machines and each other. Proponents argue that AI has the potential to solve some of humanity's most pressing challenges, from climate change to disease diagnosis. They envision a future where AI assistants enhance human productivity, autonomous vehicles reduce traffic accidents, and machine learning algorithms make breakthrough discoveries in science and medicine. However, critics warn of potential downsides, including job displacement, privacy concerns, and the ethical implications of delegating important decisions to machines. There are also fears about the long-term consequences of creating superintelligent AI systems that could potentially outstrip human control. As we navigate this complex landscape, it becomes increasingly important to develop robust governance frameworks and ethical guidelines to ensure that AI development aligns with human values and benefits society as a whole.

**Output:**
AI's rapid advancement is reshaping society, offering solutions to major challenges but raising concerns about job displacement, privacy, and ethics. Balancing AI's potential with its risks requires careful governance and ethical guidelines.

#### Example 2: Medium-Length Paragraph

**Input:**
The evolution of social media platforms has dramatically altered the landscape of human communication and information sharing. What began as simple networking sites have grown into complex ecosystems that influence everything from personal relationships to global politics. While social media has enabled unprecedented connectivity and democratized information access, it has also given rise to challenges such as misinformation spread, privacy breaches, and addictive design patterns. As these platforms continue to evolve, there is an ongoing debate about their role in society and the need for regulation to address their impact on mental health, democracy, and social cohesion.

**Output:**
Social media has transformed communication, offering connectivity but also causing issues like misinformation and privacy concerns. Its evolving role in society sparks debates on regulation and impact on mental health and democracy.

#### Example 3: Short Paragraph

**Input:**
Climate change is one of the most pressing issues of our time, with far-reaching consequences for ecosystems, economies, and human societies worldwide. Rising global temperatures are leading to more frequent extreme weather events, sea level rise, and shifts in wildlife populations and vegetation patterns. Addressing this challenge requires a coordinated global effort to reduce greenhouse gas emissions and transition to sustainable energy sources.

**Output:**
Climate change, a critical global issue, causes extreme weather, rising seas, and ecosystem shifts. Tackling it needs worldwide cooperation to cut emissions and adopt sustainable energy.

#### Example 4: Brief Statement

**Input:**
The rise of e-commerce has transformed the retail landscape, offering consumers unprecedented convenience and choice while posing challenges for traditional brick-and-mortar stores.

**Output:**
E-commerce growth offers consumer convenience, challenging traditional stores.

## Environmental Impact

Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

- **Hardware Type:** RTX 4000 SFF 20GB Ada Generation Graphics Card
- **Hours used:** 1423
- **Cloud Provider:** Private Infrastructure
- **Compute Region:** Kochi, India (Asia Pacific)
- **Carbon Emitted:** 458.21 kg CO2

## Technical Specifications

### Model Architecture and Objective

The model uses the Llama 3.1 8B architecture with 4-bit quantization. It was fine-tuned using LoRA for the task of token compression.

### Compute Infrastructure

#### Hardware

RTX 4000 SFF 20GB Ada Generation Graphics Card

#### Software

- Hugging Face Transformers
- PEFT (Parameter-Efficient Fine-Tuning)
- Accelerate
- bitsandbytes
- TRL (Transformer Reinforcement Learning)

## Model Card Contact

aloshdeny@gmail.com

### Framework versions

- PEFT 0.12.0