HinaBl/TsunamiCat32K1140E

Mar 12

I have no clue how you trained this, but this is by far the best and most realistic RVCv2 model I've seen.

Seriously. It's one of the very few ones that actually sound natural and authentic with good audio quality and minimal artifacts, and works for both speaking and singing in realtime (I'm using w-okada's RVCC).

HinaBl

Owner Mar 12

its tried with a semi-clean dataset, i found out that training a model with a dataset thats not super clean gives me more realistic models than a super clean one

eg: the nekrolina model too

n-Coder

Mar 13

Semi-clean as in everything is consistent and filtered the same but just lightly? Or as in both clean takes and intentionally leaving a few unfiltered takes?
Any special preprocessing voodoo or just the usual UVR5 stuff and cutting into chunks?

Thanks for the hint btw, the Nekrolina one turned out great, too!
Your TsunamiCat model still has a higher dynamic range, though, so it performs even better on quieter parts. Like when whispering or talking very quietly, the Tsunami one goes into a sexy vocal fry which really helps with overall authenticity. Most models just add unnatural background noise or noise gate artifacts in that range.

HinaBl

Owner Mar 14

I basically just do what I call scuffed models where I just run the original dataset to a background remover like UVR using Vocal FT and just manually remove stuff like tts manually

For background noise I try to intentionally ignore them(keep them in since I'm lazy)

HinaBl
/

TsunamiCat32K1140E

Thank you :)