smol_llama-220M-GQA-32k-theta-sft

Experimental model meant to serve as a long-context speculative decoding model.

Created using Doctor-Shotgun/smol_llama-220M-GQA-32k-theta and finetuning at 32768 context length on several instruction datasets.

This variant uses the rope theta (rope frequency base) method for context extension.

The trained instruction format is Alpaca:

### Instruction:
{{instruction}}

### Input:
{{user input}}

### Response:
{{model response}}

Downloads last month: 7

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train Doctor-Shotgun/smol_llama-220M-GQA-32k-theta-sft

Collection including Doctor-Shotgun/smol_llama-220M-GQA-32k-theta-sft

LLM Speculative Decoding

Collection

Tiny language models meant to serve as draft models for speculative decoding. • 6 items • Updated Jan 6 • 2