Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Quantization made by Richard Erkhov.

Github

Discord

Request more models

Mistral-10.7B-Instruct-v0.3-depth-upscaling - GGUF

Name Quant method Size
Mistral-10.7B-Instruct-v0.3-depth-upscaling.Q2_K.gguf Q2_K 3.73GB
Mistral-10.7B-Instruct-v0.3-depth-upscaling.IQ3_XS.gguf IQ3_XS 4.14GB
Mistral-10.7B-Instruct-v0.3-depth-upscaling.IQ3_S.gguf IQ3_S 4.37GB
Mistral-10.7B-Instruct-v0.3-depth-upscaling.Q3_K_S.gguf Q3_K_S 4.35GB
Mistral-10.7B-Instruct-v0.3-depth-upscaling.IQ3_M.gguf IQ3_M 4.52GB
Mistral-10.7B-Instruct-v0.3-depth-upscaling.Q3_K.gguf Q3_K 4.84GB
Mistral-10.7B-Instruct-v0.3-depth-upscaling.Q3_K_M.gguf Q3_K_M 4.84GB
Mistral-10.7B-Instruct-v0.3-depth-upscaling.Q3_K_L.gguf Q3_K_L 5.27GB
Mistral-10.7B-Instruct-v0.3-depth-upscaling.IQ4_XS.gguf IQ4_XS 5.43GB
Mistral-10.7B-Instruct-v0.3-depth-upscaling.Q4_0.gguf Q4_0 5.66GB
Mistral-10.7B-Instruct-v0.3-depth-upscaling.IQ4_NL.gguf IQ4_NL 5.72GB
Mistral-10.7B-Instruct-v0.3-depth-upscaling.Q4_K_S.gguf Q4_K_S 5.7GB
Mistral-10.7B-Instruct-v0.3-depth-upscaling.Q4_K.gguf Q4_K 6.02GB
Mistral-10.7B-Instruct-v0.3-depth-upscaling.Q4_K_M.gguf Q4_K_M 6.02GB
Mistral-10.7B-Instruct-v0.3-depth-upscaling.Q4_1.gguf Q4_1 6.28GB
Mistral-10.7B-Instruct-v0.3-depth-upscaling.Q5_0.gguf Q5_0 6.89GB
Mistral-10.7B-Instruct-v0.3-depth-upscaling.Q5_K_S.gguf Q5_K_S 6.89GB
Mistral-10.7B-Instruct-v0.3-depth-upscaling.Q5_K.gguf Q5_K 7.08GB
Mistral-10.7B-Instruct-v0.3-depth-upscaling.Q5_K_M.gguf Q5_K_M 7.08GB
Mistral-10.7B-Instruct-v0.3-depth-upscaling.Q5_1.gguf Q5_1 7.51GB
Mistral-10.7B-Instruct-v0.3-depth-upscaling.Q6_K.gguf Q6_K 8.21GB
Mistral-10.7B-Instruct-v0.3-depth-upscaling.Q8_0.gguf Q8_0 10.63GB

Original model description:

base_model: - mistralai/Mistral-7B-Instruct-v0.3 library_name: transformers license: apache-2.0 language: - en

mistral-7b-instruct-v0.3-depth-upscaling

image/webpMistral: a strong, cold northwesterly wind that blows through the Rhône valley and southern France into the Mediterranean, mainly in winter.

image/png

This is an attempt at depth upscaling , Based on the paper SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling, which is a technique designed to efficiently scale large language models. The process begins with structural depthwise scaling which may initially reduce performance, but this is rapidly restored during a crucial continued pretraining phase. This phase optimizes the expanded model's parameters to the new depth configuration, significantly enhancing performance.

It's important to note that this represents only the initial phase of the model's development. The next critical steps involve fine-tuning. As expected and according to the paper, the initial upscaled model in phase one (without fine-tuning) scores lower than the base model. This is expected to improve above and beyond this after fine-tuning is finished. Feel free to fine-tune on your own dataset.

Merge Details

Merge Method

This model was merged using the passthrough merge method. The first 24 layers of one copy of the model are stitched to the last 24 layers of another copy, resulting in a total of 48 layers with 10.7B parameters.

Models Merged

The following models were included in the merge:

Configuration

The following configuration was used to produce this model:

slices:
  - sources:
    - model: mistralai/Mistral-7B-Instruct-v0.3
      layer_range: [0, 24]
  - sources:
    - model: mistralai/Mistral-7B-Instruct-v0.3
      layer_range: [8, 32]
merge_method: passthrough
dtype: bfloat16

Eval results:

Metric Value
Avg. 64.04
ARC (25-shot) 63.14
HellaSwag (10-shot) 83.29
MMLU (5-shot) 62.31
TruthfulQA (0-shot) 60.65
Winogrande (5-shot) 78.45
GSM8K (5-shot) 36.39
Full results here
Downloads last month
97
GGUF
Model size
10.7B params
Architecture
llama

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference API
Unable to determine this model's library. Check the docs .