ndebuhr
/

Gemma-2-27B-Technical-Tutorial-Summarization-QLoRA

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

ndebuhr commited on Jul 22

Commit

9039939

•

1 Parent(s): cae6d67

Update README.md

Files changed (1) hide show

README.md +66 -1

README.md CHANGED Viewed

@@ -12,12 +12,77 @@ tags:
 - sft
 ---
-# Uploaded  model
 - **Developed by:** ndebuhr
 - **License:** apache-2.0
 - **Finetuned from model :** unsloth/gemma-2-27b-it-bnb-4bit
 This gemma2 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
 [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

 - sft
 ---
+# Model Specifications
+- **Max Sequence Length**: Trained at 16384 (via RoPE Scaling)
+- **Data Type**: Auto detection, with options for Float16 and Bfloat16
+- **Quantization**: 4bit, to reduce memory usage
+## Training Data
+Used a private dataset with hundreds of technical tutorials and associated summaries.
+## Implementation Highlights
+- **Efficiency**: Emphasis on reducing memory usage and accelerating download speeds through 4bit quantization.
+- **Adaptability**: Auto detection of data types and support for advanced configuration options like RoPE scaling, LoRA, and gradient checkpointing.
+# Uploaded Model
 - **Developed by:** ndebuhr
 - **License:** apache-2.0
 - **Finetuned from model :** unsloth/gemma-2-27b-it-bnb-4bit
+# Configuration and Usage
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
+import torch
+input_text = ""
+# Set device based on CUDA availability
+device = "cuda" if torch.cuda.is_available() else "cpu"
+# Load the model and tokenizer
+model_name = "ndebuhr/Gemma-2-27B-Technical-Tutorial-Summarization-QLoRA"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(model_name).to(device)
+instruction = "Clarify and summarize this tutorial transcript"
+prompt = """{}
+### Raw Transcript:
+{}
+### Summary:
+"""
+# Tokenize the input text
+inputs = tokenizer(
+    prompt.format(instruction, input_text),
+    return_tensors="pt",
+    truncation=True,
+    max_length=16384
+).to(device)
+# Generate outputs
+outputs = model.generate(
+    **inputs,
+    max_length=16384,
+    num_return_sequences=1,
+    use_cache=True
+)
+# Decode the generated text
+generated_text = tokenizer.batch_decode(outputs, skip_special_tokens=True)
+```
+## Compute Infrastructure
+* Fine-tuning: used 1xA100 (40GB)
+* Inference: recommend 1xL4 (24GB)
 This gemma2 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
 [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)