--- license: creativeml-openrail-m tags: - computer vision - stable-diffusion - stable-diffusion-2-1 - photography - photoreal --- # Deprecation notice This model was a research project that is deprecated in favour of ptx0/pseudo-flex-base # Capabilities This model is capable of producing photorealistic images of people. It retains much of the base 2.1-v model knowledge, as its text encoder is minimally tuned. # Limitations This model does not produce perfect results every time. This model cannot reproduce most real people. Instead, it makes "Derp-a-Like" equivalents to real people, which I prefer. This model is not great at abstract imagery or digital art, though it certainly can produce a variety of amazing art styles. # Dataset * cushman (8000 kodachrome slides from 1939 to 1969) * midjourney v5.1-filtered (about 22,000 upscaled v5.1 images) * national geographic (about 3-4,000 >1024x768 images of animals, wildlife, landscapes, history) * a small dataset of stock images of people vaping / smoking # Training parameters * polynomial learning rate scheduler shared between TE and Unet starting at 4e-8 and decaying to 1e-8 * batch size 15, gradient accumulations 10 => effective BS=150 * target is 30,000 steps but will likely stop sooner * terminal SNR enforced betas # Training goals * explore the effects of terminal SNR scheduling * improve faces, especially "at a distance" * improve composition, eg. completeness of resulting image * improve prompt comprehension, eg. "do what i want, even if it is weird" * retain / introduce a slightly colourful flavour due to the midjourney data * enhance understanding of the past, through the Cushman collection * retain the ability to produce natural landscapes and animals via National Geographic # Observations * at 1650 steps, we still haven't cracked the code on faces. * at 250 steps, we had amazing photoreal Mars landscapes that have carried forward mostly to 1650 steps * lighting and composition are at their best # Future work This model inspired the search for a solution to the proliferation issue that led me to ttj/flex-diffusion-2-1, which led to the creation of ptx0/pseudo-flex-base, another photoreal model with multiple aspect support. This model was trained **purely** on 768x768 square images, which were randomly resized and cropped. It can produce some higher resolution landscapes, but it cannot reliably do higher resolution subjects without deformities.