File size: 8,515 Bytes

---
base_model:
- yuvraj17/EvolCodeLlama-3.1-8B-Instruct
- yzhuang/Meta-Llama-3-8B-Instruct_fictional_gsm8k_English_v1
tags:
- merge
- mergekit
- lazymergekit
- yuvraj17/EvolCodeLlama-3.1-8B-Instruct
- yzhuang/Meta-Llama-3-8B-Instruct_fictional_gsm8k_English_v1
---

# Llama3-8B-Instruct-Slerp

Llama3-8B-Instruct-Slerp is a merge of the following models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing):
* [yuvraj17/EvolCodeLlama-3.1-8B-Instruct](https://huggingface.co/yuvraj17/EvolCodeLlama-3.1-8B-Instruct)
* [yzhuang/Meta-Llama-3-8B-Instruct_fictional_gsm8k_English_v1](https://huggingface.co/yzhuang/Meta-Llama-3-8B-Instruct_fictional_gsm8k_English_v1)

## 🧩 Configuration

```yaml
slices:
  - sources:
      - model: yuvraj17/EvolCodeLlama-3.1-8B-Instruct
        layer_range: [0, 32]
      - model: yzhuang/Meta-Llama-3-8B-Instruct_fictional_gsm8k_English_v1
        layer_range: [0, 32]
merge_method: slerp
base_model: yuvraj17/EvolCodeLlama-3.1-8B-Instruct
parameters:
  t:
    - filter: self_attn
      value: [0, 0.5, 0.3, 0.7, 1]
    - filter: mlp
      value: [1, 0.5, 0.7, 0.3, 0]
    - value: 0.5
dtype: float16

```

## 💻 Usage

```python
!pip install -qU transformers accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "yuvraj17/Llama3-8B-Instruct-Slerp"
messages = [{"role": "user", "content": "What is a large language model?"}]

tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

outputs = pipeline(prompt, max_new_tokens=512, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
```
> A large language model is a computer program that can process and generate human language. It is typically trained on a large corpus of text data and uses machine learning algorithms to learn the patterns and structures of language. Large language models are able to generate coherent and context-specific text based on a given prompt or input.
> There are two main types of large language models: generative and discriminative. Generative models aim to generate new text based on the patterns they've learned from the training data. Discriminative models aim to classify or label the input text as belonging to a particular class or genre.
> Some of the key characteristics of large language models include:
> 1. Scale: Large language models are trained on vast amounts of text data, often measured in gigabytes or terabytes. This large scale allows them to learn complex patterns and structures of language.
> 2. Complexity: The internal workings of a large language model can be quite complex. They use advanced algorithms and mathematical techniques, such as transformers and recurrent neural networks, to process and generate text.
> 3. Flexibility: Large language models can be fine-tuned for specific tasks or domains by adjusting their hyperparameters or adding small amounts of task-specific data. This allows them to adapt to new domains or tasks.
> 4. Interpretability: While large language models are not always interpretable, some models provide insights into their decision-making process. This can be useful for tasks like question-answering or text summarization, where understanding the reasoning behind the model's output is important.
> 5. Evaluation: The performance of a large language model is typically evaluated using metrics such as perplexity, BLEU, or ROUGE. These metrics measure how well the model can predict the next word or sentence in a sequence, or how well it can generate coherent and meaningful text.
> 
> Some examples of large language models include:
> 1. BERT (Bidirectional Encoder Representations from Transformers): Developed by Google, BERT is a widely used pre-trained language model that can be fine-tuned for a variety of NLP tasks.
> 2. RoBERTa (Robust BERT): Developed by Salesforce, RoBERTa is a variation of BERT that has been fine-tuned on a larger corpus of text data.
> 3. XLNet (Extreme Large Neural Network): Developed by Microsoft Research, XLNet is a large transformer-based language model that can process long-range dependencies in text data.
> 4. T5 (Text-to-Text Transfer Transformer): Developed by Google, T5 is a large encoder-decoder model that can perform a wide

# 🏆 Evaluation Scores

## Nous

|                                                     Model                                   |AGIEval|TruthfulQA|Bigbench|
|---------------------------------------------------------------------------------------------|------:|---------:|-------:|
|[yuvraj17/Llama3-8B-Instruct-Slerp](https://huggingface.co/yuvraj17/Llama3-8B-Instruct-Slerp)|  33.28|     49.78|   35.38|


### AGIEval
|             Task             |Version| Metric  | Value |   | Stderr |
|------------------------------|------:|---------|------:|---|-------:|
| agieval_aqua_rat              |      0| acc     | 23.62 |±  |  2.67  |
|                              |       | acc_norm| 22.05 |±  |  2.61  |
| agieval_logiqa_en             |      0| acc     | 27.50 |±  |  1.75  |
|                              |       | acc_norm| 31.80 |±  |  1.83  |
| agieval_lsat_ar               |      0| acc     | 21.30 |±  |  2.71  |
|                              |       | acc_norm| 20.87 |±  |  2.69  |
| agieval_lsat_lr               |      0| acc     | 35.29 |±  |  2.12  |
|                              |       | acc_norm| 37.65 |±  |  2.15  |
| agieval_lsat_rc               |      0| acc     | 42.01 |±  |  3.01  |
|                              |       | acc_norm| 39.78 |±  |  2.99  |
| agieval_sat_en                |      0| acc     | 55.83 |±  |  3.47  |
|                              |       | acc_norm| 50.49 |±  |  3.49  |
| agieval_sat_en_without_passage|      0| acc     | 36.89 |±  |  3.37  |
|                              |       | acc_norm| 34.95 |±  |  3.33  |
| agieval_sat_math              |      0| acc     | 29.55 |±  |  3.08  |
|                              |       | acc_norm| 28.64 |±  |  3.05  |

**Average score**: 33.28%

### TruthfulQA


|        Task         |Version| Metric | Value |   | Stderr |
|---------------------|------:|--------|------:|---|-------:|
| truthfulqa_mc       |      1| mc1    | 33.54 |±  |  1.65  |
|                     |       | mc2    | 49.78 |±  |  1.53  |

**Average score**: 49.78%

### BigBench

|                Task                |Version|        Metric         | Value |   | Stderr |
|------------------------------------|------:|-----------------------|------:|---|-------:|
| bigbench_causal_judgement          |      0| multiple_choice_grade  | 47.89 |±  |  3.63  |
| bigbench_date_understanding        |      0| multiple_choice_grade  | 39.02 |±  |  2.54  |
| bigbench_disambiguation_qa         |      0| multiple_choice_grade  | 33.72 |±  |  2.95  |
| bigbench_geometric_shapes          |      0| multiple_choice_grade  | 20.61 |±  |  2.14  |
| bigbench_logical_deduction_five_objects|  0| multiple_choice_grade  | 31.40 |±  |  2.08  |
| bigbench_logical_deduction_seven_objects| 0| multiple_choice_grade  | 23.71 |±  |  1.61  |
| bigbench_logical_deduction_three_objects| 0| multiple_choice_grade  | 47.00 |±  |  2.89  |
| bigbench_movie_recommendation      |      0| multiple_choice_grade  | 27.40 |±  |  1.99  |
| bigbench_navigate                  |      0| multiple_choice_grade  | 50.10 |±  |  1.58  |
| bigbench_reasoning_about_colored_objects| 0| multiple_choice_grade  | 38.40 |±  |  1.09  |
| bigbench_ruin_names                |      0| multiple_choice_grade  | 27.23 |±  |  2.11  |
| bigbench_salient_translation_error_detection| 0| multiple_choice_grade  | 25.45 |±  |  1.38  |
| bigbench_snarks                    |      0| multiple_choice_grade  | 46.41 |±  |  3.72  |
| bigbench_sports_understanding      |      0| multiple_choice_grade  | 50.30 |±  |  1.59  |
| bigbench_temporal_sequences        |      0| multiple_choice_grade  | 37.30 |±  |  1.53  |
| bigbench_tracking_shuffled_objects_five_objects| 0| multiple_choice_grade  | 21.36 |±  |  1.16  |
| bigbench_tracking_shuffled_objects_seven_objects| 0| multiple_choice_grade  | 17.14 |±  |  0.90  |
| bigbench_tracking_shuffled_objects_three_objects| 0| multiple_choice_grade  | 47.00 |±  |  2.89  |

**Average score**: 35.38%