Edit model card

Model Card for Llama 3 8B SFT Code Bagel

image/png

Model Details

Model Description

This model, Llama3-8B-SFT-code_bagel-bnb-4bit, is a fine-tuned version of the Meta-Llama-3-8B-Instruct model, finetuned via SFT on 35k randomly selected rows from the Replete-AI/code_bagel dataset using Supervised Fine-Tuning (SFT) and quantized to 4-bit precision using the Bits and Bytes (bnb) library. It is optimized for code-related tasks.

Uses

Coding and code related tasks

How to Get Started with the Model

Use the code below to get started with the model.

[More Information Needed]

import torch
import transformers

# Load the tokenizer and model
model_id = "thesven/Llama3-8B-SFT-code_bagel-bnb-4bit"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

messages = [
    {
        "role": "user",
        "content": "Write me a python function to turn every other letter in a string to uppercase?",
    },
]

prompt = pipeline.tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)

terminators = [
    pipeline.tokenizer.eos_token_id,
    pipeline.tokenizer.convert_tokens_to_ids("<|eot_id|>"),
]

outputs = pipeline(
    prompt,
    max_new_tokens=256,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.1,
)
print(outputs[0]["generated_text"][len(prompt) :])
Downloads last month
14
Safetensors
Model size
8.03B params
Tensor type
FP16
·
Inference API
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train thesven/Llama3-8B-SFT-code_bagel-bnb-4bit

Spaces using thesven/Llama3-8B-SFT-code_bagel-bnb-4bit 2

Collection including thesven/Llama3-8B-SFT-code_bagel-bnb-4bit