metadata

base_model: unsloth/Meta-Llama-3.1-8B-bnb-4bit
library_name: peft
license: apache-2.0
datasets:
  - Respair/sharegpt_chatml_compressed
  - diwank/llmlingua-compressed-text
  - AlexMaclean/wikipedia-deletion-compressions
  - AlexMaclean/all-deletion-compressions
  - sentence-transformers/sentence-compression
language:
  - en
tags:
  - compression
  - pytorch
  - facebook
  - meta
  - llama
  - llama-3
pipeline_tag: text-generation

Model Card for Model ID

Memories - Token Compressor for Long-Range Dependency Conversations

Model Details

Model Description

This model is a fine-tuned version of the Llama 3.1 8B 4-bit model, specifically trained for token compression tasks. It uses LoRA (Low-Rank Adaptation) for efficient fine-tuning while maintaining the base model's performance.

Developed by: Alosh Denny
Funded by: EmelinLabs
Shared by EmelinLabs
Model type: Token Compressor for Memories
Language(s) (NLP): English
License: apache-2.0

Uses

Direct Use

This model is designed for token compression tasks. It can be used to generate more concise versions of input text while preserving the essential meaning.

Downstream Use

The compressed outputs from this model can be used in various NLP applications where text length is a constraint, such as summarization, efficient text storage, or as input for other language models with token limits.

Out-of-Scope Use

This model should not be used for tasks that require full preservation of the original text or where nuanced details are critical. It's not suitable for legal, medical, or other domains where precise wording is essential.

Bias, Risks, and Limitations

The model may inadvertently remove important context or nuance during compression.
There might be biases inherited from the base Llama 3.1 model or introduced during fine-tuning.
The model's performance may vary depending on the input text's domain or complexity.

Recommendations

Users should review the compressed outputs for accuracy and appropriateness before use in critical applications.
It's advisable to test the model on a diverse range of inputs to understand its performance across different text types and domains.

How to Get Started with the Model

Use the code below to get started with the model.

from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM

config = PeftConfig.from_pretrained("aoxo/llama-token-compressor")
base_model = AutoModelForCausalLM.from_pretrained("unsloth/Meta-Llama-3.1-8B-bnb-4bit")
model = PeftModel.from_pretrained(base_model, "aoxo/llama-token-compressor")

Training Details

Training Data

The model was trained on a dataset compiled from various sources, including:

Respair/sharegpt_chatml_compressed
diwank/llmlingua-compressed-text
AlexMaclean/wikipedia-deletion-compressions
AlexMaclean/all-deletion-compressions
sentence-transformers/sentence-compression

Training Procedure

Preprocessing

Prompt-response pairs were processed from the datasets and compiled into a single dataset (available at https://huggingface.co/datasets/aoxo/token_compressor). Unwanted characters, trailing whitespaces and inverted commas were voided.

Training Hyperparameters

Training regime: bf16 mixed precision
Optimizer: paged_adamw_8bit
Learning rate: 2e-4
LR scheduler: cosine
Batch size: 4 per device
Gradient accumulation steps: 16
Number of epochs: 10
Max steps: 175,118

LoRA Configuration

r: 8
lora_alpha: 16
lora_dropout: 0.05
bias: none
task_type: CAUSAL_LM

Speeds, Sizes, Times

Total Training Compute Throughput: 8.62 PFLOPS
Total Logged Training Time: 1422.31 hours
Start Time: 07-21-2024 02:02:32
End Time: 09-18-2024 08:21:08
Checkpoint Size (Adapter): 13,648,432 bytes

Evaluation Data, Factors & Results

Evaluation

Total Evaluation Compute Throughput: 14.34 GFLOPS
Total Logged Evaluation Time: 34.25 minutes
Start Time: 09-18-2024 08:23:11
End Time: 09-18-2024 08:57:26

Evaluation Data

Evaluation was performed on a subset of the following dataset:

sentence-transformers/sentence-compression

Results

To demonstrate the model's performance, we've tested it on prompts of varying lengths. The results show how the model compresses texts of different sizes while maintaining the core meaning.

Example 1: Very Large Paragraph

Input: The impact of artificial intelligence on modern society is a topic of intense debate and speculation. As AI technologies continue to advance at an unprecedented pace, they are reshaping industries, transforming job markets, and altering the way we interact with machines and each other. Proponents argue that AI has the potential to solve some of humanity's most pressing challenges, from climate change to disease diagnosis. They envision a future where AI assistants enhance human productivity, autonomous vehicles reduce traffic accidents, and machine learning algorithms make breakthrough discoveries in science and medicine. However, critics warn of potential downsides, including job displacement, privacy concerns, and the ethical implications of delegating important decisions to machines. There are also fears about the long-term consequences of creating superintelligent AI systems that could potentially outstrip human control. As we navigate this complex landscape, it becomes increasingly important to develop robust governance frameworks and ethical guidelines to ensure that AI development aligns with human values and benefits society as a whole.

Output: AI's rapid advancement is reshaping society, offering solutions to major challenges but raising concerns about job displacement, privacy, and ethics. Balancing AI's potential with its risks requires careful governance and ethical guidelines.

Example 2: Medium-Length Paragraph

Input: The evolution of social media platforms has dramatically altered the landscape of human communication and information sharing. What began as simple networking sites have grown into complex ecosystems that influence everything from personal relationships to global politics. While social media has enabled unprecedented connectivity and democratized information access, it has also given rise to challenges such as misinformation spread, privacy breaches, and addictive design patterns. As these platforms continue to evolve, there is an ongoing debate about their role in society and the need for regulation to address their impact on mental health, democracy, and social cohesion.

Output: Social media has transformed communication, offering connectivity but also causing issues like misinformation and privacy concerns. Its evolving role in society sparks debates on regulation and impact on mental health and democracy.

Example 3: Short Paragraph

Input: Climate change is one of the most pressing issues of our time, with far-reaching consequences for ecosystems, economies, and human societies worldwide. Rising global temperatures are leading to more frequent extreme weather events, sea level rise, and shifts in wildlife populations and vegetation patterns. Addressing this challenge requires a coordinated global effort to reduce greenhouse gas emissions and transition to sustainable energy sources.

Output: Climate change, a critical global issue, causes extreme weather, rising seas, and ecosystem shifts. Tackling it needs worldwide cooperation to cut emissions and adopt sustainable energy.

Example 4: Brief Statement

Input: The rise of e-commerce has transformed the retail landscape, offering consumers unprecedented convenience and choice while posing challenges for traditional brick-and-mortar stores.

Output: E-commerce growth offers consumer convenience, challenging traditional stores.

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: RTX 4000 SFF 20GB Ada Generation Graphics Card
Hours used: 1423
Cloud Provider: Private Infrastructure
Compute Region: Kochi, India (Asia Pacific)
Carbon Emitted: 458.21 kg CO2

Technical Specifications

Model Architecture and Objective

The model uses the Llama 3.1 8B architecture with 4-bit quantization. It was fine-tuned using LoRA for the task of token compression.

Compute Infrastructure

Hardware

RTX 4000 SFF 20GB Ada Generation Graphics Card

Software

Hugging Face Transformers
PEFT (Parameter-Efficient Fine-Tuning)
Accelerate
bitsandbytes
TRL (Transformer Reinforcement Learning)

Model Card Contact

aloshdeny@gmail.com

Framework versions

PEFT 0.12.0