Edit model card

Converted version of Mixtral Instruct to 4-bit using bitsandbytes. For more information about the model, refer to the model's page.

Impact on performance

Impact of quantization on a set of models.

Evaluation of the model was conducted using the PoLL (Pool of LLM) technique, assessing performance on 100 French questions with scores aggregated from six evaluations (two per evaluator). The evaluators included GPT-4o, Gemini-1.5-pro, and Claude3.5-sonnet.

Performance Scores (on a scale of 5):

Model Score # params size (GB)
gpt-4o 4.13 N/A N/A
mistralai/Mixtral-8x7B-Instruct-v0.1 3.71 46.7b 93.4
cmarkea/Mixtral-8x7B-Instruct-v0.1-4bit 3.68 46.7b 23.35
gpt-3.5-turbo 3.66 175b 350
TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ 3.56 46.7b 46.7
mistralai/Mistral-7B-Instruct-v0.2 1.98 7.25b 14.5
cmarkea/bloomz-7b1-mt-sft-chat 1.69 7.07b 14.14
cmarkea/bloomz-3b-dpo-chat 1.68 3b 6
cmarkea/bloomz-3b-sft-chat 1.51 3b 6
croissantllm/CroissantLLMChat-v0.1 1.19 1.3b 2.7
cmarkea/bloomz-560m-sft-chat 1.04 0.56b 1.12
OpenLLM-France/Claire-Mistral-7B-0.1 0.38 7.25b 14.5

The impact of quantization is negligible.

Prompt Pattern

Here is a reminder of the command pattern to interact with the model if the add_special_tokens option is disabled (otherwise, do not include the BOS symbol and space at the beginning of the sequence):

<s> [INST] {user_prompt_1} [/INST] model_answer_1</s> [INST] {user_prompt_2} [/INST] model_answer_2</s>
Downloads last month
8
Safetensors
Model size
24.2B params
Tensor type
F32
·
BF16
·
U8
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including cmarkea/Mixtral-8x7B-Instruct-v0.1-4bit