General feedback discussion.

#1
by Lewdiculous - opened
No description provided.
Lewdiculous pinned discussion

This model is next level in terms of formatting, personality, and expression. It is hands down the best LLM released in the past few months.
I was actually taken aback and somewhat moved by the expressiveness, and I got a cold asf heart.
It follows character cards competently, using the conversation examples well. Still occasionally forgets small details like they all do.
Its pretty lurid when it wants to be, driving encounters forward together with you using subtle prompting.

using Poppy_Porpoise_0.7_Context preset for context and the ChatML instruct preset
and the lewdicu-3.0.3-mistral-0.2 text completion preset, go crazy with the temperature.

I was very impressed with the consistently great response formatting.

Are you using it with SillyTavern, I used the recommended samplers and it reared a bit off track with just Koboldcpp.

Using the latest SillyTavern, yeah.

This might be helpful (samplers):

https://huggingface.co/Virt-io/SillyTavern-Presets/discussions/5#664d6fb87c563d4d95151baa

This model is really interesting, but for me it tries really hard to make things go lewd. Seeing just hint of something sensual it makes it hard core just in few turns. I use fresh prompt and instruct from
Virt-io and samplers from above link.

Looks like I was the target audience all along.
Thanks for the feedback.
πŸ˜…


Author feedback: @Sao10K

Oh, yes, I see now in original description "A model leaning towards NSFW, mention explicitly in prompts if you want to steer away. [Avoid Negative Reinforcement]".
That's it, I guess.

Since all my roleplays are NSFW it's hard for me to evaluate normal chats, haha.

This model is next level in terms of formatting, personality, and expression. It is hands down the best LLM released in the past few months.
I was actually taken aback and somewhat moved by the expressiveness, and I got a cold asf heart.
It follows character cards competently, using the conversation examples well. Still occasionally forgets small details like they all do.
Its pretty lurid when it wants to be, driving encounters forward together with you using subtle prompting.

using Poppy_Porpoise_0.7_Context preset for context and the ChatML instruct preset
and the lewdicu-3.0.3-mistral-0.2 text completion preset, go crazy with the temperature.

I'm new to this. I don't see any option to load a text completion preset. Where is it?

the import buttons are here:
import.jpg

Since all my roleplays are NSFW it's hard for me to evaluate normal chats, haha.

lol don't feel too bad. I always start some big detailed world and story with a big overarching goal. End up ERPing with every girl I come across. I have yet to accomplish major goals.

.. End up ERPing with every girl I come across. I have yet to accomplish major goals.

Oh, nice, I should learn new expression - "to ERP a girl" :)

A model leaning towards NSFW

I was able to mitigate this somewhat with "Characters' must correspond to their characters, age, knowledge and skills. Characters' must match their level of sexual development." instructions.

Hello i have a question what are the difference between those
IQ / Q
Q3 / Q4
S / M / XXS / NL / XS / K_M / K_S / K and Q8_0

just checked other model and its described there... but some explanation will be great for each model in this repo thanks.

@thesxw There's a lot on that topic to be honest. Click here if you want a detailed explanation on these suffixes/options.

You can use the same explanations as found in other places. Same rules apply.

On this topic, this is generally a recommend read, should let you see how each quant stacks up against each other.

One observation is that in the Q4 range there are both IQ and Q quants. In that case, at least here the regular Q quants have faster initial prompt/context ingestion speeds, but other than that they should be comparable for their BPW/sizes.

The general rule of thumb is that starting at the I/Q4 range, they should perform decently, Q4_K_M (or maybe Q5_K_S with --flashattention at 8K context only if that's enough for you) being a very good option for quality and speed, when considering 8GBs of VRAM.

The Q5_K_S/M quants are the best balance of near full quality and speed/size, you can use the new --flashattention feature in KoboldCpp (8K context size) to maybe squeeze in more context or a step up in quant quality.

@Lewdiculous

The Q5_K_S/M quants are the best balance of near full quality and speed/size, you can use the new --flashattention feature in KoboldCpp (8K context size) to maybe squeeze in more context or a step up in quant quality.

Can/should I use --flashattention with 16k context size? I have 12G VRAM.

@JMan77 Sure! Looks good.

As long as your dedicated VRAM usage doesn't go above 11.5/12GB just to be safe it will be good to use the Q5_K_M too for optimal quality.

Thank you for making this. I'm using Q8 with a 3090. I'm playing around with ROPE scaling. I'm not getting good results. What settings should I use for 12k and 16k context respectively?

I'm using text-gen-webui so I have alpha value, rope_freq_base and compress_pos_emb.

Thanks again for all you help.

I'm unable to help you with text-gen-webui, I'd recommend using KoboldCpp – you just need to set the --contextsize you want and it should be all good to go.

You may ask the author or in the original model about manual RoPE scaling with Ooba, or in the Ooba discord/subreddit.

Tagging @saishf just in case they know.

That's fine then. 8k context is actually an improvement considering my previous go-to llm. I'll check out the resources you have provided but it's not the end of the world if that doesn't work.

My previous go-to llm was Nethena Mlewd 23B. While that may have been a tad more nsfw, this one has better logical reasoning. Thanks again for making this.

Edit: What the deal with Imatrix.dat? Can I use that in anyway?

What the deal with Imatrix.dat? Can I use that in anyway?

Ah, don't worry about that, it's just the file I used when making the quants. It is generated from the full model but doesn't do anything on its own, it's only for the quanting process.

I can use 12K and 16K context here, so I think all you need is to get the latest version of KoboldCpp and use it to run the model instead, you should be able to get at least 16K.

@mike-ceara Oh, I think I found a good resource for that in a previous Discussion, how you can get the values you need:

https://huggingface.co/Lewdiculous/InfinityRP-v1-7B-GGUF-IQ-Imatrix/discussions/1#66025ead660cfe25366c32d3

I use 16K with my models at Q5s, a rope theta of 1638400 and a scale of 1.0. I've found it to work perfectly. Llama3 scales really well.
But I agree, koboldcpp is probably the better option if you use it as an API
(It's webui isn't great though)

Text-Gen-Webui's uses these types:

  • alpha_value: Used to extend the context length of a model with a minor loss in quality. I have measured 1.75 to be optimal for 1.5x context, and 2.5 for 2x context. That is, with alpha = 2.5 you can make a model with 4096 context length go to 8192 context length.
  • rope_freq_base: Originally another way to write "alpha_value", it ended up becoming a necessary parameter for some models like CodeLlama, which was fine-tuned with this set to 1000000 and hence needs to be loaded with it set to 1000000 as well.
  • compress_pos_emb: The first and original context-length extension method, discovered by kaiokendev. When set to 2, the context length is doubled, 3 and it's tripled, etc. It should only be used for models that have been fine-tuned with this parameter set to different than 1. For models that have not been tuned to have greater context length, alpha_value will lead to a smaller accuracy loss.

Source

This model is surprisingly decent at handling side characters in a 1-to-1 roleplay session. I turned off "Include Names" in ST and kept "Force enable for groups and personas" on. I don't think I've come across a 7B model that did it this well.
image.png

Well... at least this model is honest about itself :)
2024-0.png

Haha, what even was that, she's going places.

I really like this model, i've been using it for the past 9 days. But i was wondering what would be the best settings for it. I was using Poppy_Porpoise settings and i slightly changed them to the recommended settings but i have trouble finding much of a difference regarding Top K when i go from 0 to 40. Any clue ?

@saishf Did share a sampler preset before, it might have been tweaked since then.

The latest it here, slightly more creative and less repetitive.
https://files.catbox.moe/bncial.json

and the lewdicu-3.0.3-mistral-0.2 text completion preset, go crazy with the temperature.

where do i find this preset?

This comment has been hidden

and the lewdicu-3.0.3-mistral-0.2
where do i find this preset?

@lustren
https://huggingface.co/Lewdiculous/Model-Requests/tree/main/data/presets

I'm sorry if this question has already been asked, but I'm just very curious if there is a way to get the model to use typical emoticons like ":) 3" in Sillytavern and so on, like in character ai?

@Remkal94 You need to set that in the character itself, not necessarily the model. Add plenty of example dialogues - set the Example Messages Behavior to Always Include - with the expressions and emojis you want, in the format you want, and in the character description add things like "{{char}} always uses emojis such as ...".

You can share your character here after trying to edit it if you need more help with setting that up.

@Remkal94 You need to set that in the character itself, not necessarily the model. Add plenty of example dialogues - set the Example Messages Behavior to Always Include - with the expressions and emojis you want, in the format you want, and in the character description add things like "{{char}} always uses emojis such as ...".

You can share your character here after trying to edit it if you need more help with setting that up.

I did exactly as you described, (I used this model) It works great and enlivens the dialogue soo much, thx a lot)

I noticed one thing that in the koboldccp console, the character uses much more emoticons than gets into the final Sillytavern chat. For example, the Koboldccp console may have several different emoticons in a row, while only one of them will be displayed in Sillytavern. changing the Repetition Penalty has no effect. Any thoughts?

If they show up in your message logs but not in the sillytavern UI it's probably a token issue
This seems similar https://github.com/LostRuins/koboldcpp/issues/353
I guess you could try ooba to rule out sillytavern being at fault
If it breaks with kobold and not with ooba or breaks with both, opening an issue in the respective repo would probably be ideal

If they show up in your message logs but not in the sillytavern UI it's probably a token issue
This seems similar https://github.com/LostRuins/koboldcpp/issues/353
I guess you could try ooba to rule out sillytavern being at fault
If it breaks with kobold and not with ooba or breaks with both, opening an issue in the respective repo would probably be ideal

In ooba, I don't see chat logs in the console, so it's hard to tell if there are any emoticons missing or not

If they show up in your message logs but not in the sillytavern UI it's probably a token issue
This seems similar https://github.com/LostRuins/koboldcpp/issues/353
I guess you could try ooba to rule out sillytavern being at fault
If it breaks with kobold and not with ooba or breaks with both, opening an issue in the respective repo would probably be ideal

In ooba, I don't see chat logs in the console, so it's hard to tell if there are any emoticons missing or not

It seems that the problem is more in koboldccp, since kobold's own webui chat has the same problem - There are, say, 5 emoticons in the log in the sentence, and there will be 3 of them in the chat itself.

Sign up or log in to comment