how about finetune meta chameleon 30b?

#2
by Slaaaaaau - opened

a few days ago meta released a 30b llm chameleon, I think its gonna be superb in term of vram24gb cards and this finetune

This isn't a finetune it's a merger.

Also I don't have enough VRAM to run such a thing sadly. I have 12GBs of ram(which is more like 8 due to OS requirements), a GTX 1050 and a Intel UHD Graphics 630 that I use to offload the kv cache.
At most I can run an 8B with 16k context if I offload 11 layers and have batch processing set to 64

And as much as I'd love to still try merging large models, I'm constrained to the 75 GB HDD limit on both colab and kaggle, which is hardly enough to do something fancy, if I can even do anything at all. Sorry.

Maybe you should try Colab TPU. It has around 250GB of storage

Really? Alright I'll give it a shot once all my free time has replenished, however, I won't be able to test the models before releasing them.

I will follow your updates!
In any case, Stheno has updated his model L3-8B-Stheno-v3.3-32K
Training Details:
Trained at 8K Context -> Expanded to 32K Context with PoSE training.

Now I waiting for Umbral with more context (goooosh 32k for group chats its like a like the long-awaited oxygen!), I really want Umbra to become smarter and be able to better navigate long chats.

I'll go test the L3-8B-Stheno-v3.3-32K and see how it works in a 32k context. So far I'm not very satisfied with the results for now.
At one time I really liked the merge blackoasis + stehno.

upd:
meh, tested it on my group chat with 300+ messages, at 32k he responds completely incoherently, even if the temperature is reduced to 0.4, as the context decreases down to 16k (the default value for me) llm begins to be more coherent relative to at least a dozen past messages without trying come up with them again.
I have a strong feeling that 8b models scale very poorly over 16k context, I’ve tried many different variations of Llama3 and every time I see approximately the same situation. Either the model repeats words in a loop, or it completely loses the context of what is happening and looks more like nonsense.

settings:

virt-IO/SillyTavern-Presets Prompts/LLAMA-3 2.0
0.6 temp, 0.075 min-P, 1.1 rep pen, and top k 50

Umbral-Mind and Halu-L3-Stheno-BlackOasis, These models create a much more enjoyable experience in a 16k context for me. BlackOasis remembers formatting better, and Umbral better in creativity.

a few days ago meta released a 30b llm chameleon

I checked it out, but I'm not exactly sure how I'm supposed to use this, I've never seen a mode structured like this before.

Now I waiting for Umbral with more context (goooosh 32k for group chats its like a like the long-awaited oxygen!), I really want Umbra to become smarter and be able to better navigate long chats.

I made a 1.0.1 version but after reading some of the issues with it I end up putting it in my archival org, if you're still curious to try it just go to Cas-Archive.

I'm also working on a new model known as Merger Omelette, feel free to give it a try whenever you want.

ok, I'll try both models.
I doubt that stehno 3.3 is the final version (wich is in 1.0.1v), I still didn’t like it with its behavior, having spent on different variations of quants up to fp16 and the size of the context, it was not possible to force it to conduct a coherent dialogue in a chat with 300+ messages, but who knows how its gonna be after merge.

Sign up or log in to comment