athirdpath
/

NeuralHermes-Mistral-13b-DARE_blended-FAILURE

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Edit model card

YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Ooof, my man ain't feeling so hot, I'd pass on this one for now. Inverting and merging 20b Llama 2 models works quite well, evening out the gradients between slices. However, these 13b Mistrals seem to HATE it, I assume due to the unbalanced nature of my recipe. More study is required.

Recipe

merge_method: dare_ties

base_model: athirdpath/BigMistral-13b
model: athirdpath/NeuralHermes-Mistral-13b

weight: 0.60 / density: 0.35
model: athirdpath/NeuralHermes-Mistral-13b-INV

weight: 0.40 / density: 0.30

int8_mask: true

dtype: bfloat16

Downloads last month: 6

Safetensors

Model size

13.3B params

Tensor type

BF16

·

Inference API

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including athirdpath/NeuralHermes-Mistral-13b-DARE_blended-FAILURE

FailLabs

WITH EXPLANATIONS - Total failures and dead-ends. Learn from my mistakes. • 5 items • Updated Dec 2, 2023