QLORA fine tuning with longer length of sequence (max_length=2048, padding=True) cause RuntimeError: CUDA error: device-side assert triggered; shorten length to 512 works !
#46 opened 12 months ago
by
nps798
MCQ Question Answering
#45 opened 12 months ago
by
Ayush8120
Is `added_tokens.json` intended to be here?
4
#43 opened 12 months ago
by
xzuyn
Adding `safetensors` variant of this model
4
#42 opened 12 months ago
by
nth-attempt
Adding `safetensors` variant of this model
#41 opened 12 months ago
by
nth-attempt
Mistral en français ?
6
#40 opened 12 months ago
by
Giroud
Question answering
11
#39 opened 12 months ago
by
codegood
Tensorflow-variant coming?
1
#37 opened 12 months ago
by
areinh
Default template and configuration for local run with GPU
#33 opened about 1 year ago
by
brunoedcf
still throws refusals
1
#31 opened about 1 year ago
by
Phoenixalight
Has a massive repetition problem
14
#29 opened about 1 year ago
by
Delcos
Which Mistral datacenter was used for training ?
2
#25 opened about 1 year ago
by
niko32
ValueError: Please specify `target_modules` in `peft_config`
3
#23 opened about 1 year ago
by
Tapendra
13b in the future?
9
#21 opened about 1 year ago
by
deleted
Architectural difference with Llama
1
#20 opened about 1 year ago
by
imone
How to deploy the model to local?
4
#19 opened about 1 year ago
by
chao0524
Quantized version of Mistral 7B (4bit or 8 bit)
3
#18 opened about 1 year ago
by
ianuvrat
FlashAttention support for Mistral HF Implementation
1
#17 opened about 1 year ago
by
mxxtsai
what r the datasets used to train the model?
1
#10 opened about 1 year ago
by
rv2307
Training data?
12
#8 opened about 1 year ago
by
dkgaraujo
Safetensor weights
#6 opened about 1 year ago
by
ghvandoorn
Dataset contamination tests
1
#1 opened about 1 year ago
by
imone