Response quality much lower than with 2K version

by mikolodz - opened Jun 27, 2023

Jun 27, 2023

Hi,

Is it normal that I'm getting really poor responses from the SuperHOT models I tested?
E.g. they stop in the middle of the sentence (Wizard-Vicuna 13B 8K) or add some strange formating like ```makefile (Wicuna 1.3 13B 8K) etc.

Maybe I'm missing something? I use ooba + AutoGPTQ, nothing fancy..

Renegadesoffun

Jun 28, 2023

@mikeoldz I think its supposed to be used with exllama under load model *make sure u have latest oobabooga updated to choose model from drop down menu) and then if still issue maybe lower response to 6000 tokens (aitreupener latest video shows how on youtube https://www.youtube.com/watch?v=199h5XxUEOY&ab_channel=Aitrepreneur

Kippykip

Jun 28, 2023

Ah yeah this model (and all the SuperHot-8k models) are not supposed to be loaded with AutoGPTQ, you use exllama.
Otherwise it'll probably be buggy lol

@Renegadesoffun 's linked video is really noob friendly, it's how I figured out how to use it also!
Good luck :D

TheBloke

Owner Jun 28, 2023

It will work with AutoGPTQ, as long as you set trust_remote_code = True and set the desired context length by editing config.json. I explained this in the Readme.

But ExLlama is definitely preferable, as it's faster, uses less VRAM, and the controls are directly integrated into the UI. I'd only suggest AutoGPTQ use if you want to use the model from Python code, or outside text-generation-webui.

mikolodz

Jun 28, 2023

•

edited Jun 28, 2023

Thank you, mate! Indeed, lowering the token count to 6144 seems to help :)

BTW. I've tried AutoGPTQ and ExLlama before, but only the above helped. Is it due to the response token count? I've been using response max_new_tokens like 2048 or 1024 and max_seq_len = 8192 with no luck on 3090. It gave me either short responses or pure nonsense.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment