File size: 8,515 Bytes
b0c88be 4fcd38f b0c88be 72c0667 4fcd38f 72c0667 6bdf3ba 72c0667 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 |
---
base_model:
- yuvraj17/EvolCodeLlama-3.1-8B-Instruct
- yzhuang/Meta-Llama-3-8B-Instruct_fictional_gsm8k_English_v1
tags:
- merge
- mergekit
- lazymergekit
- yuvraj17/EvolCodeLlama-3.1-8B-Instruct
- yzhuang/Meta-Llama-3-8B-Instruct_fictional_gsm8k_English_v1
---
# Llama3-8B-Instruct-Slerp
Llama3-8B-Instruct-Slerp is a merge of the following models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing):
* [yuvraj17/EvolCodeLlama-3.1-8B-Instruct](https://huggingface.co/yuvraj17/EvolCodeLlama-3.1-8B-Instruct)
* [yzhuang/Meta-Llama-3-8B-Instruct_fictional_gsm8k_English_v1](https://huggingface.co/yzhuang/Meta-Llama-3-8B-Instruct_fictional_gsm8k_English_v1)
## 🧩 Configuration
```yaml
slices:
- sources:
- model: yuvraj17/EvolCodeLlama-3.1-8B-Instruct
layer_range: [0, 32]
- model: yzhuang/Meta-Llama-3-8B-Instruct_fictional_gsm8k_English_v1
layer_range: [0, 32]
merge_method: slerp
base_model: yuvraj17/EvolCodeLlama-3.1-8B-Instruct
parameters:
t:
- filter: self_attn
value: [0, 0.5, 0.3, 0.7, 1]
- filter: mlp
value: [1, 0.5, 0.7, 0.3, 0]
- value: 0.5
dtype: float16
```
## 💻 Usage
```python
!pip install -qU transformers accelerate
from transformers import AutoTokenizer
import transformers
import torch
model = "yuvraj17/Llama3-8B-Instruct-Slerp"
messages = [{"role": "user", "content": "What is a large language model?"}]
tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
"text-generation",
model=model,
torch_dtype=torch.float16,
device_map="auto",
)
outputs = pipeline(prompt, max_new_tokens=512, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
```
> A large language model is a computer program that can process and generate human language. It is typically trained on a large corpus of text data and uses machine learning algorithms to learn the patterns and structures of language. Large language models are able to generate coherent and context-specific text based on a given prompt or input.
> There are two main types of large language models: generative and discriminative. Generative models aim to generate new text based on the patterns they've learned from the training data. Discriminative models aim to classify or label the input text as belonging to a particular class or genre.
> Some of the key characteristics of large language models include:
> 1. Scale: Large language models are trained on vast amounts of text data, often measured in gigabytes or terabytes. This large scale allows them to learn complex patterns and structures of language.
> 2. Complexity: The internal workings of a large language model can be quite complex. They use advanced algorithms and mathematical techniques, such as transformers and recurrent neural networks, to process and generate text.
> 3. Flexibility: Large language models can be fine-tuned for specific tasks or domains by adjusting their hyperparameters or adding small amounts of task-specific data. This allows them to adapt to new domains or tasks.
> 4. Interpretability: While large language models are not always interpretable, some models provide insights into their decision-making process. This can be useful for tasks like question-answering or text summarization, where understanding the reasoning behind the model's output is important.
> 5. Evaluation: The performance of a large language model is typically evaluated using metrics such as perplexity, BLEU, or ROUGE. These metrics measure how well the model can predict the next word or sentence in a sequence, or how well it can generate coherent and meaningful text.
>
> Some examples of large language models include:
> 1. BERT (Bidirectional Encoder Representations from Transformers): Developed by Google, BERT is a widely used pre-trained language model that can be fine-tuned for a variety of NLP tasks.
> 2. RoBERTa (Robust BERT): Developed by Salesforce, RoBERTa is a variation of BERT that has been fine-tuned on a larger corpus of text data.
> 3. XLNet (Extreme Large Neural Network): Developed by Microsoft Research, XLNet is a large transformer-based language model that can process long-range dependencies in text data.
> 4. T5 (Text-to-Text Transfer Transformer): Developed by Google, T5 is a large encoder-decoder model that can perform a wide
# 🏆 Evaluation Scores
## Nous
| Model |AGIEval|TruthfulQA|Bigbench|
|---------------------------------------------------------------------------------------------|------:|---------:|-------:|
|[yuvraj17/Llama3-8B-Instruct-Slerp](https://huggingface.co/yuvraj17/Llama3-8B-Instruct-Slerp)| 33.28| 49.78| 35.38|
### AGIEval
| Task |Version| Metric | Value | | Stderr |
|------------------------------|------:|---------|------:|---|-------:|
| agieval_aqua_rat | 0| acc | 23.62 |± | 2.67 |
| | | acc_norm| 22.05 |± | 2.61 |
| agieval_logiqa_en | 0| acc | 27.50 |± | 1.75 |
| | | acc_norm| 31.80 |± | 1.83 |
| agieval_lsat_ar | 0| acc | 21.30 |± | 2.71 |
| | | acc_norm| 20.87 |± | 2.69 |
| agieval_lsat_lr | 0| acc | 35.29 |± | 2.12 |
| | | acc_norm| 37.65 |± | 2.15 |
| agieval_lsat_rc | 0| acc | 42.01 |± | 3.01 |
| | | acc_norm| 39.78 |± | 2.99 |
| agieval_sat_en | 0| acc | 55.83 |± | 3.47 |
| | | acc_norm| 50.49 |± | 3.49 |
| agieval_sat_en_without_passage| 0| acc | 36.89 |± | 3.37 |
| | | acc_norm| 34.95 |± | 3.33 |
| agieval_sat_math | 0| acc | 29.55 |± | 3.08 |
| | | acc_norm| 28.64 |± | 3.05 |
**Average score**: 33.28%
### TruthfulQA
| Task |Version| Metric | Value | | Stderr |
|---------------------|------:|--------|------:|---|-------:|
| truthfulqa_mc | 1| mc1 | 33.54 |± | 1.65 |
| | | mc2 | 49.78 |± | 1.53 |
**Average score**: 49.78%
### BigBench
| Task |Version| Metric | Value | | Stderr |
|------------------------------------|------:|-----------------------|------:|---|-------:|
| bigbench_causal_judgement | 0| multiple_choice_grade | 47.89 |± | 3.63 |
| bigbench_date_understanding | 0| multiple_choice_grade | 39.02 |± | 2.54 |
| bigbench_disambiguation_qa | 0| multiple_choice_grade | 33.72 |± | 2.95 |
| bigbench_geometric_shapes | 0| multiple_choice_grade | 20.61 |± | 2.14 |
| bigbench_logical_deduction_five_objects| 0| multiple_choice_grade | 31.40 |± | 2.08 |
| bigbench_logical_deduction_seven_objects| 0| multiple_choice_grade | 23.71 |± | 1.61 |
| bigbench_logical_deduction_three_objects| 0| multiple_choice_grade | 47.00 |± | 2.89 |
| bigbench_movie_recommendation | 0| multiple_choice_grade | 27.40 |± | 1.99 |
| bigbench_navigate | 0| multiple_choice_grade | 50.10 |± | 1.58 |
| bigbench_reasoning_about_colored_objects| 0| multiple_choice_grade | 38.40 |± | 1.09 |
| bigbench_ruin_names | 0| multiple_choice_grade | 27.23 |± | 2.11 |
| bigbench_salient_translation_error_detection| 0| multiple_choice_grade | 25.45 |± | 1.38 |
| bigbench_snarks | 0| multiple_choice_grade | 46.41 |± | 3.72 |
| bigbench_sports_understanding | 0| multiple_choice_grade | 50.30 |± | 1.59 |
| bigbench_temporal_sequences | 0| multiple_choice_grade | 37.30 |± | 1.53 |
| bigbench_tracking_shuffled_objects_five_objects| 0| multiple_choice_grade | 21.36 |± | 1.16 |
| bigbench_tracking_shuffled_objects_seven_objects| 0| multiple_choice_grade | 17.14 |± | 0.90 |
| bigbench_tracking_shuffled_objects_three_objects| 0| multiple_choice_grade | 47.00 |± | 2.89 |
**Average score**: 35.38%
|