---
license: openrail++
tags:
- text-to-image
- stable-diffusion
---

![image/gif](https://cdn-uploads.huggingface.co/production/uploads/637a6daf7ce76c3b83497ea2/ux_sZKB9snVPsKRT1TzfG.gif)

# Model Description
- **Developed by**: Natural Synthetics Inc.
- **Model type**: Diffusion-based text-to-image generative model
- **License**: CreativeML Open RAIL++-M License
- **Model Description**: This is a model that can be used to generate and modify images based on text prompts. It is a Latent Diffusion Model that uses two fixed, pretrained text encoders (OpenCLIP-ViT/G and CLIP-ViT/L).
- **Resources for more information**: Check out our [GitHub Repository](https://github.com/hotshotco/hotshot-xl).


# Limitations and Bias
## Limitations
- The model does not achieve perfect photorealism
- The model cannot render legible text
- The model struggles with more difficult tasks which involve compositionality, such as rendering an image corresponding to “A red cube on top of a blue sphere”
- Faces and people in general may not be generated properly.

## Bias
While the capabilities of video generation models are impressive, they can also reinforce or exacerbate social biases.