yuvraj17's picture
Update README.md
6bdf3ba verified
|
raw
history blame
5.88 kB
---
base_model:
- yuvraj17/EvolCodeLlama-3.1-8B-Instruct
- yzhuang/Meta-Llama-3-8B-Instruct_fictional_gsm8k_English_v1
tags:
- merge
- mergekit
- lazymergekit
- yuvraj17/EvolCodeLlama-3.1-8B-Instruct
- yzhuang/Meta-Llama-3-8B-Instruct_fictional_gsm8k_English_v1
---
# Llama3-8B-Instruct-Slerp
Llama3-8B-Instruct-Slerp is a merge of the following models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing):
* [yuvraj17/EvolCodeLlama-3.1-8B-Instruct](https://huggingface.co/yuvraj17/EvolCodeLlama-3.1-8B-Instruct)
* [yzhuang/Meta-Llama-3-8B-Instruct_fictional_gsm8k_English_v1](https://huggingface.co/yzhuang/Meta-Llama-3-8B-Instruct_fictional_gsm8k_English_v1)
## 🧩 Configuration
```yaml
slices:
- sources:
- model: yuvraj17/EvolCodeLlama-3.1-8B-Instruct
layer_range: [0, 32]
- model: yzhuang/Meta-Llama-3-8B-Instruct_fictional_gsm8k_English_v1
layer_range: [0, 32]
merge_method: slerp
base_model: yuvraj17/EvolCodeLlama-3.1-8B-Instruct
parameters:
t:
- filter: self_attn
value: [0, 0.5, 0.3, 0.7, 1]
- filter: mlp
value: [1, 0.5, 0.7, 0.3, 0]
- value: 0.5
dtype: float16
```
## 💻 Usage
```python
!pip install -qU transformers accelerate
from transformers import AutoTokenizer
import transformers
import torch
model = "yuvraj17/Llama3-8B-Instruct-Slerp"
messages = [{"role": "user", "content": "What is a large language model?"}]
tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
"text-generation",
model=model,
torch_dtype=torch.float16,
device_map="auto",
)
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
```
# 🏆 Evaluation Scores
## Nous
| Model |AGIEval|TruthfulQA|Bigbench|
|---------------------------------------------------------------------------------------------|------:|---------:|-------:|
|[yuvraj17/Llama3-8B-Instruct-Slerp](https://huggingface.co/yuvraj17/Llama3-8B-Instruct-Slerp)| 33.28| 49.78| 35.38|
### AGIEval
| Task |Version| Metric | Value | | Stderr |
|------------------------------|------:|---------|------:|---|-------:|
| agieval_aqua_rat | 0| acc | 23.62 |± | 2.67 |
| | | acc_norm| 22.05 |± | 2.61 |
| agieval_logiqa_en | 0| acc | 27.50 |± | 1.75 |
| | | acc_norm| 31.80 |± | 1.83 |
| agieval_lsat_ar | 0| acc | 21.30 |± | 2.71 |
| | | acc_norm| 20.87 |± | 2.69 |
| agieval_lsat_lr | 0| acc | 35.29 |± | 2.12 |
| | | acc_norm| 37.65 |± | 2.15 |
| agieval_lsat_rc | 0| acc | 42.01 |± | 3.01 |
| | | acc_norm| 39.78 |± | 2.99 |
| agieval_sat_en | 0| acc | 55.83 |± | 3.47 |
| | | acc_norm| 50.49 |± | 3.49 |
| agieval_sat_en_without_passage| 0| acc | 36.89 |± | 3.37 |
| | | acc_norm| 34.95 |± | 3.33 |
| agieval_sat_math | 0| acc | 29.55 |± | 3.08 |
| | | acc_norm| 28.64 |± | 3.05 |
**Average score**: 33.28%
### TruthfulQA
| Task |Version| Metric | Value | | Stderr |
|---------------------|------:|--------|------:|---|-------:|
| truthfulqa_mc | 1| mc1 | 33.54 |± | 1.65 |
| | | mc2 | 49.78 |± | 1.53 |
**Average score**: 49.78%
### BigBench
| Task |Version| Metric | Value | | Stderr |
|------------------------------------|------:|-----------------------|------:|---|-------:|
| bigbench_causal_judgement | 0| multiple_choice_grade | 47.89 |± | 3.63 |
| bigbench_date_understanding | 0| multiple_choice_grade | 39.02 |± | 2.54 |
| bigbench_disambiguation_qa | 0| multiple_choice_grade | 33.72 |± | 2.95 |
| bigbench_geometric_shapes | 0| multiple_choice_grade | 20.61 |± | 2.14 |
| bigbench_logical_deduction_five_objects| 0| multiple_choice_grade | 31.40 |± | 2.08 |
| bigbench_logical_deduction_seven_objects| 0| multiple_choice_grade | 23.71 |± | 1.61 |
| bigbench_logical_deduction_three_objects| 0| multiple_choice_grade | 47.00 |± | 2.89 |
| bigbench_movie_recommendation | 0| multiple_choice_grade | 27.40 |± | 1.99 |
| bigbench_navigate | 0| multiple_choice_grade | 50.10 |± | 1.58 |
| bigbench_reasoning_about_colored_objects| 0| multiple_choice_grade | 38.40 |± | 1.09 |
| bigbench_ruin_names | 0| multiple_choice_grade | 27.23 |± | 2.11 |
| bigbench_salient_translation_error_detection| 0| multiple_choice_grade | 25.45 |± | 1.38 |
| bigbench_snarks | 0| multiple_choice_grade | 46.41 |± | 3.72 |
| bigbench_sports_understanding | 0| multiple_choice_grade | 50.30 |± | 1.59 |
| bigbench_temporal_sequences | 0| multiple_choice_grade | 37.30 |± | 1.53 |
| bigbench_tracking_shuffled_objects_five_objects| 0| multiple_choice_grade | 21.36 |± | 1.16 |
| bigbench_tracking_shuffled_objects_seven_objects| 0| multiple_choice_grade | 17.14 |± | 0.90 |
| bigbench_tracking_shuffled_objects_three_objects| 0| multiple_choice_grade | 47.00 |± | 2.89 |
**Average score**: 35.38%