This model is made with the intention to be used for fine-tuning. It should not to be used for inference as is. This is a pruned version of Meta-Llama-3-70B-Instruct .

Meta-Llama-3-70B-Instruct has 70.6 billion params and Drobeta-Turnu-Severin has 44.9 billion (~63% param size)

Steps to replicate:

Use laserQlora.ipynb from cognitivecomputations/laserRMT to determine which layers should be eliminated.

Adapt the script for Meta-Llama-3-70B-Instruct by replacing model_name = "mistralai/Mistral-7B-v0.1" with model_name = "Meta-Llama-3-70B-Instruct" and layer_numbers = list(range(31, -1, -1)) with layer_numbers = list(range(79, -1, -1)), 79 being the last recurrent layer index Meta-Llama-3-70B-Instruct has.

Then look for the layer indexes where self_attn.v_proj snr is Infinity and eliminate those layers using mergekit. Here are the layer indexes that were eliminated: 11,17,37,40,41,42,43,44,45,46,48,49,50,51,53,54,55,57,58,59,60,61,62,63,64,65,66,67,68,69 .

Here is the mergekit config:

slices:
  - sources:
    - model: "meta-llama/Meta-Llama-3-70B-Instruct"
      layer_range: [0, 11]
  - sources:
    - model: "meta-llama/Meta-Llama-3-70B-Instruct"
      layer_range: [12, 17]
  - sources:
    - model: "meta-llama/Meta-Llama-3-70B-Instruct"
      layer_range: [18, 37]
  - sources:
    - model: "meta-llama/Meta-Llama-3-70B-Instruct"
      layer_range: [38, 40]
  - sources:
    - model: "meta-llama/Meta-Llama-3-70B-Instruct"
      layer_range: [47, 48]
  - sources:
    - model: "meta-llama/Meta-Llama-3-70B-Instruct"
      layer_range: [52, 53]
  - sources:
    - model: "meta-llama/Meta-Llama-3-70B-Instruct"
      layer_range: [56, 57]
  - sources:
    - model: "meta-llama/Meta-Llama-3-70B-Instruct"
      layer_range: [70, 80]
merge_method: passthrough
dtype: bfloat16