|
--- |
|
base_model: |
|
- yuvraj17/EvolCodeLlama-3.1-8B-Instruct |
|
- yzhuang/Meta-Llama-3-8B-Instruct_fictional_gsm8k_English_v1 |
|
tags: |
|
- merge |
|
- mergekit |
|
- lazymergekit |
|
- yuvraj17/EvolCodeLlama-3.1-8B-Instruct |
|
- yzhuang/Meta-Llama-3-8B-Instruct_fictional_gsm8k_English_v1 |
|
--- |
|
|
|
# Llama3-8B-Instruct-Slerp |
|
|
|
Llama3-8B-Instruct-Slerp is a merge of the following models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing): |
|
* [yuvraj17/EvolCodeLlama-3.1-8B-Instruct](https://huggingface.co/yuvraj17/EvolCodeLlama-3.1-8B-Instruct) |
|
* [yzhuang/Meta-Llama-3-8B-Instruct_fictional_gsm8k_English_v1](https://huggingface.co/yzhuang/Meta-Llama-3-8B-Instruct_fictional_gsm8k_English_v1) |
|
|
|
## 🧩 Configuration |
|
|
|
```yaml |
|
slices: |
|
- sources: |
|
- model: yuvraj17/EvolCodeLlama-3.1-8B-Instruct |
|
layer_range: [0, 32] |
|
- model: yzhuang/Meta-Llama-3-8B-Instruct_fictional_gsm8k_English_v1 |
|
layer_range: [0, 32] |
|
merge_method: slerp |
|
base_model: yuvraj17/EvolCodeLlama-3.1-8B-Instruct |
|
parameters: |
|
t: |
|
- filter: self_attn |
|
value: [0, 0.5, 0.3, 0.7, 1] |
|
- filter: mlp |
|
value: [1, 0.5, 0.7, 0.3, 0] |
|
- value: 0.5 |
|
dtype: float16 |
|
|
|
``` |
|
|
|
## 💻 Usage |
|
|
|
```python |
|
!pip install -qU transformers accelerate |
|
|
|
from transformers import AutoTokenizer |
|
import transformers |
|
import torch |
|
|
|
model = "yuvraj17/Llama3-8B-Instruct-Slerp" |
|
messages = [{"role": "user", "content": "What is a large language model?"}] |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model) |
|
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
|
pipeline = transformers.pipeline( |
|
"text-generation", |
|
model=model, |
|
torch_dtype=torch.float16, |
|
device_map="auto", |
|
) |
|
|
|
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95) |
|
print(outputs[0]["generated_text"]) |
|
``` |
|
|
|
# 🏆 Evaluation Scores |
|
|
|
## Nous |
|
|
|
| Model |AGIEval|TruthfulQA|Bigbench| |
|
|---------------------------------------------------------------------------------------------|------:|---------:|-------:| |
|
|[yuvraj17/Llama3-8B-Instruct-Slerp](https://huggingface.co/yuvraj17/Llama3-8B-Instruct-Slerp)| 33.28| 49.78| 35.38| |
|
|
|
|
|
### AGIEval |
|
| Task |Version| Metric | Value | | Stderr | |
|
|------------------------------|------:|---------|------:|---|-------:| |
|
| agieval_aqua_rat | 0| acc | 23.62 |± | 2.67 | |
|
| | | acc_norm| 22.05 |± | 2.61 | |
|
| agieval_logiqa_en | 0| acc | 27.50 |± | 1.75 | |
|
| | | acc_norm| 31.80 |± | 1.83 | |
|
| agieval_lsat_ar | 0| acc | 21.30 |± | 2.71 | |
|
| | | acc_norm| 20.87 |± | 2.69 | |
|
| agieval_lsat_lr | 0| acc | 35.29 |± | 2.12 | |
|
| | | acc_norm| 37.65 |± | 2.15 | |
|
| agieval_lsat_rc | 0| acc | 42.01 |± | 3.01 | |
|
| | | acc_norm| 39.78 |± | 2.99 | |
|
| agieval_sat_en | 0| acc | 55.83 |± | 3.47 | |
|
| | | acc_norm| 50.49 |± | 3.49 | |
|
| agieval_sat_en_without_passage| 0| acc | 36.89 |± | 3.37 | |
|
| | | acc_norm| 34.95 |± | 3.33 | |
|
| agieval_sat_math | 0| acc | 29.55 |± | 3.08 | |
|
| | | acc_norm| 28.64 |± | 3.05 | |
|
|
|
**Average score**: 33.28% |
|
|
|
### TruthfulQA |
|
|
|
|
|
| Task |Version| Metric | Value | | Stderr | |
|
|---------------------|------:|--------|------:|---|-------:| |
|
| truthfulqa_mc | 1| mc1 | 33.54 |± | 1.65 | |
|
| | | mc2 | 49.78 |± | 1.53 | |
|
|
|
**Average score**: 49.78% |
|
|
|
### BigBench |
|
|
|
| Task |Version| Metric | Value | | Stderr | |
|
|------------------------------------|------:|-----------------------|------:|---|-------:| |
|
| bigbench_causal_judgement | 0| multiple_choice_grade | 47.89 |± | 3.63 | |
|
| bigbench_date_understanding | 0| multiple_choice_grade | 39.02 |± | 2.54 | |
|
| bigbench_disambiguation_qa | 0| multiple_choice_grade | 33.72 |± | 2.95 | |
|
| bigbench_geometric_shapes | 0| multiple_choice_grade | 20.61 |± | 2.14 | |
|
| bigbench_logical_deduction_five_objects| 0| multiple_choice_grade | 31.40 |± | 2.08 | |
|
| bigbench_logical_deduction_seven_objects| 0| multiple_choice_grade | 23.71 |± | 1.61 | |
|
| bigbench_logical_deduction_three_objects| 0| multiple_choice_grade | 47.00 |± | 2.89 | |
|
| bigbench_movie_recommendation | 0| multiple_choice_grade | 27.40 |± | 1.99 | |
|
| bigbench_navigate | 0| multiple_choice_grade | 50.10 |± | 1.58 | |
|
| bigbench_reasoning_about_colored_objects| 0| multiple_choice_grade | 38.40 |± | 1.09 | |
|
| bigbench_ruin_names | 0| multiple_choice_grade | 27.23 |± | 2.11 | |
|
| bigbench_salient_translation_error_detection| 0| multiple_choice_grade | 25.45 |± | 1.38 | |
|
| bigbench_snarks | 0| multiple_choice_grade | 46.41 |± | 3.72 | |
|
| bigbench_sports_understanding | 0| multiple_choice_grade | 50.30 |± | 1.59 | |
|
| bigbench_temporal_sequences | 0| multiple_choice_grade | 37.30 |± | 1.53 | |
|
| bigbench_tracking_shuffled_objects_five_objects| 0| multiple_choice_grade | 21.36 |± | 1.16 | |
|
| bigbench_tracking_shuffled_objects_seven_objects| 0| multiple_choice_grade | 17.14 |± | 0.90 | |
|
| bigbench_tracking_shuffled_objects_three_objects| 0| multiple_choice_grade | 47.00 |± | 2.89 | |
|
|
|
**Average score**: 35.38% |
|
|
|
|
|
|