Resources

QLORA fine tuning with longer length of sequence (max_length=2048, padding=True) cause RuntimeError: CUDA error: device-side assert triggered; shorten length to 512 works !

#46 opened 12 months ago by

nps798

MCQ Question Answering

#45 opened 12 months ago by

Ayush8120

Is `added_tokens.json` intended to be here?

#43 opened 12 months ago by

xzuyn

Adding `safetensors` variant of this model

#42 opened 12 months ago by

nth-attempt

Adding `safetensors` variant of this model

#41 opened 12 months ago by

nth-attempt

Mistral en français ?

#40 opened 12 months ago by

Giroud

Question answering

#39 opened 12 months ago by

codegood

Tensorflow-variant coming?

#37 opened 12 months ago by

areinh

Default template and configuration for local run with GPU

#33 opened about 1 year ago by

brunoedcf

still throws refusals

#31 opened about 1 year ago by

Phoenixalight

Has a massive repetition problem

#29 opened about 1 year ago by

Delcos

Which Mistral datacenter was used for training ?

#25 opened about 1 year ago by

niko32

ValueError: Please specify `target_modules` in `peft_config`

#23 opened about 1 year ago by

Tapendra

13b in the future?

#21 opened about 1 year ago by deleted

Architectural difference with Llama

#20 opened about 1 year ago by

imone

How to deploy the model to local?

#19 opened about 1 year ago by

chao0524

Quantized version of Mistral 7B (4bit or 8 bit)

#18 opened about 1 year ago by

ianuvrat

FlashAttention support for Mistral HF Implementation

#17 opened about 1 year ago by

mxxtsai

what r the datasets used to train the model?

#10 opened about 1 year ago by

rv2307

Training data?

#8 opened about 1 year ago by

dkgaraujo

Safetensor weights

#6 opened about 1 year ago by

ghvandoorn

Dataset contamination tests

#1 opened about 1 year ago by

imone