--- license: creativeml-openrail-m base_model: runwayml/stable-diffusion-v1-5 tags: - stable-diffusion - stable-diffusion-diffusers - text-to-image - diffusers - controlnet - jax-diffusers-event inference: true datasets: - mfidabel/sam-coyo-2k - mfidabel/sam-coyo-2.5k - mfidabel/sam-coyo-3k language: - en library_name: diffusers --- # ControlNet - mfidabel/controlnet-segment-anything These are controlnet weights trained on runwayml/stable-diffusion-v1-5 with a new type of conditioning. You can find some example images in the following. **prompt**: contemporary living room of a house **negative prompt**: low quality ![images_0)](./images_0.png) **prompt**: new york buildings, Vincent Van Gogh starry night **negative prompt**: low quality, monochrome ![images_1)](./images_1.png) **prompt**: contemporary living room, high quality, 4k, realistic **negative prompt**: low quality, monochrome, low res ![images_2)](./images_2.png) ## Limitations and Bias - The model can't render text - Landscapes with fewer segments tend to render better - Some segmentation maps tend to render in monochrome (use a negative_prompt to get around it) - Some generated images can be over saturated - Shorter prompts usually work better, as long as it makes sense with the input segmentation map - The model is biased to produce more paintings images rather than realistic images, as there are a lot of paintings in the training dataset ## Training **Training Data** This model was trained using a Segmented dataset based on the [COYO-700M Dataset](https://huggingface.co/datasets/kakaobrain/coyo-700m). [Stable Diffusion v1.5](https://huggingface.co/runwayml/stable-diffusion-v1-5) checkpoint was used as the base model for the controlnet. The model was trained as follows: - 25k steps with the [SAM-COYO-2k](https://huggingface.co/datasets/mfidabel/sam-coyo-2k) dataset - 28k steps with the [SAM-COYO-2.5k](https://huggingface.co/datasets/mfidabel/sam-coyo-2.5k) dataset - 38k steps with the [SAM-COYO-3k](https://huggingface.co/datasets/mfidabel/sam-coyo-3k) dataset In that particular order. - **Hardware**: Google Cloud TPUv4-8 VM - **Optimizer**: AdamW - **Train Batch Size**: 2 x 4 = 8 - **Learning rate**: 0.00001 constant - **Gradient Accumulation Steps**: 1 - **Resolution**: 512