--- license: openrail++ tags: - text-to-image - stable-diffusion --- ![image/gif](https://cdn-uploads.huggingface.co/production/uploads/637a6daf7ce76c3b83497ea2/ux_sZKB9snVPsKRT1TzfG.gif) # Model Description - **Developed by**: Natural Synthetics Inc. - **Model type**: Diffusion-based text-to-image generative model - **License**: CreativeML Open RAIL++-M License - **Model Description**: This is a model that can be used to generate and modify images based on text prompts. It is a Latent Diffusion Model that uses two fixed, pretrained text encoders (OpenCLIP-ViT/G and CLIP-ViT/L). - **Resources for more information**: Check out our [GitHub Repository](https://github.com/hotshotco/hotshot-xl). # Limitations and Bias ## Limitations - The model does not achieve perfect photorealism - The model cannot render legible text - The model struggles with more difficult tasks which involve compositionality, such as rendering an image corresponding to “A red cube on top of a blue sphere” - Faces and people in general may not be generated properly. ## Bias While the capabilities of video generation models are impressive, they can also reinforce or exacerbate social biases.