--- language: - en license: llama3 tags: - axolotl base_model: meta-llama/Meta-Llama-3-8B datasets: - BEE-spoke-data/KI-smorgasbord_fw-small pipeline_tag: text-generation model-index: - name: Llama-3-6.3b-v0.1 results: - task: type: text-generation name: Text Generation dataset: name: IFEval (0-Shot) type: HuggingFaceH4/ifeval args: num_few_shot: 0 metrics: - type: inst_level_strict_acc and prompt_level_strict_acc value: 10.44 name: strict accuracy source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=pszemraj/Llama-3-6.3b-v0.1 name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: BBH (3-Shot) type: BBH args: num_few_shot: 3 metrics: - type: acc_norm value: 18.68 name: normalized accuracy source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=pszemraj/Llama-3-6.3b-v0.1 name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MATH Lvl 5 (4-Shot) type: hendrycks/competition_math args: num_few_shot: 4 metrics: - type: exact_match value: 1.51 name: exact match source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=pszemraj/Llama-3-6.3b-v0.1 name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: GPQA (0-shot) type: Idavidrein/gpqa args: num_few_shot: 0 metrics: - type: acc_norm value: 4.47 name: acc_norm source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=pszemraj/Llama-3-6.3b-v0.1 name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MuSR (0-shot) type: TAUR-Lab/MuSR args: num_few_shot: 0 metrics: - type: acc_norm value: 6.15 name: acc_norm source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=pszemraj/Llama-3-6.3b-v0.1 name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MMLU-PRO (5-shot) type: TIGER-Lab/MMLU-Pro config: main split: test args: num_few_shot: 5 metrics: - type: acc value: 20.44 name: accuracy source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=pszemraj/Llama-3-6.3b-v0.1 name: Open LLM Leaderboard --- # Llama-3-6.3b-v0.1 This is a layer pruning experiment based off of the original llama-3-8b: - 8 layers pruned with [PruneMe](https://github.com/pszemraj/PruneMe/tree/upgrades)/MergeKit - layers selected using [BEE-spoke-data/fineweb-100k_en-med](https://hf.co/datasets/BEE-spoke-data/fineweb-100k_en-med) - brief subsequent continued pretraining @ ctx 4096 - data: 10k rows of FineWeb (different than pruning data) + some curated data - wandb [here](https://wandb.ai/pszemraj/llama3-pruning) ## quick eval hf (pretrained=pszemraj/Llama-3-6.3b-v0.1,trust_remote_code=True,dtype=bfloat16), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1 | Tasks |Version|Filter|n-shot| Metric |Value | |Stderr| |--------------|------:|------|-----:|----------|-----:|---|-----:| |arc_easy | 1|none | 0|acc |0.7109|± |0.0093| | | |none | 0|acc_norm |0.6843|± |0.0095| |boolq | 2|none | 0|acc |0.7920|± |0.0071| |lambada_openai| 1|none | 0|perplexity|4.5411|± |0.1073| | | |none | 0|acc |0.6734|± |0.0065| |openbookqa | 1|none | 0|acc |0.3000|± |0.0205| | | |none | 0|acc_norm |0.4140|± |0.0220| |piqa | 1|none | 0|acc |0.7443|± |0.0102| | | |none | 0|acc_norm |0.7530|± |0.0101| |winogrande | 1|none | 0|acc |0.7127|± |0.0127| ## Details [Built with Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl)
See axolotl config axolotl version: `0.4.0` ```yaml base_model: pszemraj/llama-3-prune_8 model_type: LlamaForCausalLM tokenizer_type: AutoTokenizer strict: false seed: 80085 # dataset datasets: - path: BEE-spoke-data/KI-smorgasbord_fw-small type: completion # format from earlier field: text # Optional[str] default: text, field to use for completion data val_set_size: 0.015 sequence_len: 4096 sample_packing: true pad_to_sequence_len: false train_on_inputs: false group_by_length: false # WANDB wandb_project: llama3-pruning wandb_entity: pszemraj wandb_watch: gradients wandb_name: Llama-3-6.3b-v0.1 hub_model_id: pszemraj/Llama-3-6.3b-v0.1 hub_strategy: every_save gradient_accumulation_steps: 16 micro_batch_size: 1 num_epochs: 1 optimizer: adamw_torch_fused # paged_adamw_32bit weight_decay: 0.05 lr_scheduler: cosine learning_rate: 4e-5 warmup_ratio: 0.1 load_in_8bit: false load_in_4bit: false bfloat16: true tf32: true flash_attention: true torch_compile: true # requires >= torch 2.0, may sometimes cause problems torch_compile_backend: inductor # Optional[str] gradient_checkpointing: true gradient_checkpointing_kwargs: use_reentrant: false # hyperparams for freq of evals, saving, etc evals_per_epoch: 5 saves_per_epoch: 3 save_safetensors: true save_total_limit: 1 output_dir: ./output-axolotl/output-model-6.3b logging_steps: 8 deepspeed: special_tokens: pad_token: <|end_of_text|> ```

### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:------:|:----:|:---------------:| | No log | 0.0006 | 1 | 7.8100 | | 2.2782 | 0.2002 | 320 | 2.3728 | | 2.2699 | 0.4004 | 640 | 2.3265 | | 2.3761 | 0.6006 | 960 | 2.2849 | | 2.2448 | 0.8008 | 1280 | 2.2702 | --- # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_pszemraj__Llama-3-6.3b-v0.1) | Metric |Value| |-------------------|----:| |Avg. |10.28| |IFEval (0-Shot) |10.44| |BBH (3-Shot) |18.68| |MATH Lvl 5 (4-Shot)| 1.51| |GPQA (0-shot) | 4.47| |MuSR (0-shot) | 6.15| |MMLU-PRO (5-shot) |20.44|