File size: 5,883 Bytes
b0c88be
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
72c0667
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
---
base_model:
- yuvraj17/EvolCodeLlama-3.1-8B-Instruct
- yzhuang/Meta-Llama-3-8B-Instruct_fictional_gsm8k_English_v1
tags:
- merge
- mergekit
- lazymergekit
- yuvraj17/EvolCodeLlama-3.1-8B-Instruct
- yzhuang/Meta-Llama-3-8B-Instruct_fictional_gsm8k_English_v1
---

# Llama3-8B-Instruct-Slerp

Llama3-8B-Instruct-Slerp is a merge of the following models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing):
* [yuvraj17/EvolCodeLlama-3.1-8B-Instruct](https://huggingface.co/yuvraj17/EvolCodeLlama-3.1-8B-Instruct)
* [yzhuang/Meta-Llama-3-8B-Instruct_fictional_gsm8k_English_v1](https://huggingface.co/yzhuang/Meta-Llama-3-8B-Instruct_fictional_gsm8k_English_v1)

## 🧩 Configuration

```yaml
slices:
  - sources:
      - model: yuvraj17/EvolCodeLlama-3.1-8B-Instruct
        layer_range: [0, 32]
      - model: yzhuang/Meta-Llama-3-8B-Instruct_fictional_gsm8k_English_v1
        layer_range: [0, 32]
merge_method: slerp
base_model: yuvraj17/EvolCodeLlama-3.1-8B-Instruct
parameters:
  t:
    - filter: self_attn
      value: [0, 0.5, 0.3, 0.7, 1]
    - filter: mlp
      value: [1, 0.5, 0.7, 0.3, 0]
    - value: 0.5
dtype: float16

```

## 💻 Usage

```python
!pip install -qU transformers accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "yuvraj17/Llama3-8B-Instruct-Slerp"
messages = [{"role": "user", "content": "What is a large language model?"}]

tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
```

# 🏆 Evaluation Scores

## Nous

|                                                     Model                                   |AGIEval|TruthfulQA|Bigbench|
|---------------------------------------------------------------------------------------------|------:|---------:|-------:|
|[yuvraj17/Llama3-8B-Instruct-Slerp](https://huggingface.co/yuvraj17/Llama3-8B-Instruct-Slerp)|  38.32|     57.15|   43.91|


### AGIEval
|             Task             |Version| Metric  | Value |   | Stderr |
|------------------------------|------:|---------|------:|---|-------:|
| agieval_aqua_rat              |      0| acc     | 23.62 |±  |  2.67  |
|                              |       | acc_norm| 22.05 |±  |  2.61  |
| agieval_logiqa_en             |      0| acc     | 27.50 |±  |  1.75  |
|                              |       | acc_norm| 31.80 |±  |  1.83  |
| agieval_lsat_ar               |      0| acc     | 21.30 |±  |  2.71  |
|                              |       | acc_norm| 20.87 |±  |  2.69  |
| agieval_lsat_lr               |      0| acc     | 35.29 |±  |  2.12  |
|                              |       | acc_norm| 37.65 |±  |  2.15  |
| agieval_lsat_rc               |      0| acc     | 42.01 |±  |  3.01  |
|                              |       | acc_norm| 39.78 |±  |  2.99  |
| agieval_sat_en                |      0| acc     | 55.83 |±  |  3.47  |
|                              |       | acc_norm| 50.49 |±  |  3.49  |
| agieval_sat_en_without_passage|      0| acc     | 36.89 |±  |  3.37  |
|                              |       | acc_norm| 34.95 |±  |  3.33  |
| agieval_sat_math              |      0| acc     | 29.55 |±  |  3.08  |
|                              |       | acc_norm| 28.64 |±  |  3.05  |

**Average score**: 33.28%

### TruthfulQA


|        Task         |Version| Metric | Value |   | Stderr |
|---------------------|------:|--------|------:|---|-------:|
| truthfulqa_mc       |      1| mc1    | 33.54 |±  |  1.65  |
|                     |       | mc2    | 49.78 |±  |  1.53  |

**Average score**: 49.78%

### BigBench

|                Task                |Version|        Metric         | Value |   | Stderr |
|------------------------------------|------:|-----------------------|------:|---|-------:|
| bigbench_causal_judgement          |      0| multiple_choice_grade  | 47.89 |±  |  3.63  |
| bigbench_date_understanding        |      0| multiple_choice_grade  | 39.02 |±  |  2.54  |
| bigbench_disambiguation_qa         |      0| multiple_choice_grade  | 33.72 |±  |  2.95  |
| bigbench_geometric_shapes          |      0| multiple_choice_grade  | 20.61 |±  |  2.14  |
| bigbench_logical_deduction_five_objects|  0| multiple_choice_grade  | 31.40 |±  |  2.08  |
| bigbench_logical_deduction_seven_objects| 0| multiple_choice_grade  | 23.71 |±  |  1.61  |
| bigbench_logical_deduction_three_objects| 0| multiple_choice_grade  | 47.00 |±  |  2.89  |
| bigbench_movie_recommendation      |      0| multiple_choice_grade  | 27.40 |±  |  1.99  |
| bigbench_navigate                  |      0| multiple_choice_grade  | 50.10 |±  |  1.58  |
| bigbench_reasoning_about_colored_objects| 0| multiple_choice_grade  | 38.40 |±  |  1.09  |
| bigbench_ruin_names                |      0| multiple_choice_grade  | 27.23 |±  |  2.11  |
| bigbench_salient_translation_error_detection| 0| multiple_choice_grade  | 25.45 |±  |  1.38  |
| bigbench_snarks                    |      0| multiple_choice_grade  | 46.41 |±  |  3.72  |
| bigbench_sports_understanding      |      0| multiple_choice_grade  | 50.30 |±  |  1.59  |
| bigbench_temporal_sequences        |      0| multiple_choice_grade  | 37.30 |±  |  1.53  |
| bigbench_tracking_shuffled_objects_five_objects| 0| multiple_choice_grade  | 21.36 |±  |  1.16  |
| bigbench_tracking_shuffled_objects_seven_objects| 0| multiple_choice_grade  | 17.14 |±  |  0.90  |
| bigbench_tracking_shuffled_objects_three_objects| 0| multiple_choice_grade  | 47.00 |±  |  2.89  |

**Average score**: 35.38%