--- base_model: - yuvraj17/EvolCodeLlama-3.1-8B-Instruct - yzhuang/Meta-Llama-3-8B-Instruct_fictional_gsm8k_English_v1 tags: - merge - mergekit - lazymergekit - yuvraj17/EvolCodeLlama-3.1-8B-Instruct - yzhuang/Meta-Llama-3-8B-Instruct_fictional_gsm8k_English_v1 --- # Llama3-8B-Instruct-Slerp Llama3-8B-Instruct-Slerp is a merge of the following models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing): * [yuvraj17/EvolCodeLlama-3.1-8B-Instruct](https://huggingface.co/yuvraj17/EvolCodeLlama-3.1-8B-Instruct) * [yzhuang/Meta-Llama-3-8B-Instruct_fictional_gsm8k_English_v1](https://huggingface.co/yzhuang/Meta-Llama-3-8B-Instruct_fictional_gsm8k_English_v1) ## 🧩 Configuration ```yaml slices: - sources: - model: yuvraj17/EvolCodeLlama-3.1-8B-Instruct layer_range: [0, 32] - model: yzhuang/Meta-Llama-3-8B-Instruct_fictional_gsm8k_English_v1 layer_range: [0, 32] merge_method: slerp base_model: yuvraj17/EvolCodeLlama-3.1-8B-Instruct parameters: t: - filter: self_attn value: [0, 0.5, 0.3, 0.7, 1] - filter: mlp value: [1, 0.5, 0.7, 0.3, 0] - value: 0.5 dtype: float16 ``` ## 💻 Usage ```python !pip install -qU transformers accelerate from transformers import AutoTokenizer import transformers import torch model = "yuvraj17/Llama3-8B-Instruct-Slerp" messages = [{"role": "user", "content": "What is a large language model?"}] tokenizer = AutoTokenizer.from_pretrained(model) prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) pipeline = transformers.pipeline( "text-generation", model=model, torch_dtype=torch.float16, device_map="auto", ) outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95) print(outputs[0]["generated_text"]) ``` # 🏆 Evaluation Scores ## Nous | Model |AGIEval|TruthfulQA|Bigbench| |---------------------------------------------------------------------------------------------|------:|---------:|-------:| |[yuvraj17/Llama3-8B-Instruct-Slerp](https://huggingface.co/yuvraj17/Llama3-8B-Instruct-Slerp)| 33.28| 49.78| 35.38| ### AGIEval | Task |Version| Metric | Value | | Stderr | |------------------------------|------:|---------|------:|---|-------:| | agieval_aqua_rat | 0| acc | 23.62 |± | 2.67 | | | | acc_norm| 22.05 |± | 2.61 | | agieval_logiqa_en | 0| acc | 27.50 |± | 1.75 | | | | acc_norm| 31.80 |± | 1.83 | | agieval_lsat_ar | 0| acc | 21.30 |± | 2.71 | | | | acc_norm| 20.87 |± | 2.69 | | agieval_lsat_lr | 0| acc | 35.29 |± | 2.12 | | | | acc_norm| 37.65 |± | 2.15 | | agieval_lsat_rc | 0| acc | 42.01 |± | 3.01 | | | | acc_norm| 39.78 |± | 2.99 | | agieval_sat_en | 0| acc | 55.83 |± | 3.47 | | | | acc_norm| 50.49 |± | 3.49 | | agieval_sat_en_without_passage| 0| acc | 36.89 |± | 3.37 | | | | acc_norm| 34.95 |± | 3.33 | | agieval_sat_math | 0| acc | 29.55 |± | 3.08 | | | | acc_norm| 28.64 |± | 3.05 | **Average score**: 33.28% ### TruthfulQA | Task |Version| Metric | Value | | Stderr | |---------------------|------:|--------|------:|---|-------:| | truthfulqa_mc | 1| mc1 | 33.54 |± | 1.65 | | | | mc2 | 49.78 |± | 1.53 | **Average score**: 49.78% ### BigBench | Task |Version| Metric | Value | | Stderr | |------------------------------------|------:|-----------------------|------:|---|-------:| | bigbench_causal_judgement | 0| multiple_choice_grade | 47.89 |± | 3.63 | | bigbench_date_understanding | 0| multiple_choice_grade | 39.02 |± | 2.54 | | bigbench_disambiguation_qa | 0| multiple_choice_grade | 33.72 |± | 2.95 | | bigbench_geometric_shapes | 0| multiple_choice_grade | 20.61 |± | 2.14 | | bigbench_logical_deduction_five_objects| 0| multiple_choice_grade | 31.40 |± | 2.08 | | bigbench_logical_deduction_seven_objects| 0| multiple_choice_grade | 23.71 |± | 1.61 | | bigbench_logical_deduction_three_objects| 0| multiple_choice_grade | 47.00 |± | 2.89 | | bigbench_movie_recommendation | 0| multiple_choice_grade | 27.40 |± | 1.99 | | bigbench_navigate | 0| multiple_choice_grade | 50.10 |± | 1.58 | | bigbench_reasoning_about_colored_objects| 0| multiple_choice_grade | 38.40 |± | 1.09 | | bigbench_ruin_names | 0| multiple_choice_grade | 27.23 |± | 2.11 | | bigbench_salient_translation_error_detection| 0| multiple_choice_grade | 25.45 |± | 1.38 | | bigbench_snarks | 0| multiple_choice_grade | 46.41 |± | 3.72 | | bigbench_sports_understanding | 0| multiple_choice_grade | 50.30 |± | 1.59 | | bigbench_temporal_sequences | 0| multiple_choice_grade | 37.30 |± | 1.53 | | bigbench_tracking_shuffled_objects_five_objects| 0| multiple_choice_grade | 21.36 |± | 1.16 | | bigbench_tracking_shuffled_objects_seven_objects| 0| multiple_choice_grade | 17.14 |± | 0.90 | | bigbench_tracking_shuffled_objects_three_objects| 0| multiple_choice_grade | 47.00 |± | 2.89 | **Average score**: 35.38%