metadata

license: creativeml-openrail-m
base_model: runwayml/stable-diffusion-v1-5
tags:
  - stable-diffusion
  - stable-diffusion-diffusers
  - text-to-image
  - diffusers
  - controlnet
  - jax-diffusers-event
inference: true
datasets:
  - mfidabel/sam-coyo-2k
  - mfidabel/sam-coyo-2.5k
  - mfidabel/sam-coyo-3k
language:
  - en
library_name: diffusers

ControlNet - mfidabel/controlnet-segment-anything

These are controlnet weights trained on runwayml/stable-diffusion-v1-5 with a new type of conditioning. You can find some example images in the following.

prompt: contemporary living room of a house

negative prompt: low quality

prompt: new york buildings, Vincent Van Gogh starry night

negative prompt: low quality, monochrome

prompt: contemporary living room, high quality, 4k, realistic

negative prompt: low quality, monochrome, low res

Limitations and Bias

The model can't render text
Landscapes with fewer segments tend to render better
Some segmentation maps tend to render in monochrome (use a negative_prompt to get around it)
Some generated images can be over saturated
Shorter prompts usually work better, as long as it makes sense with the input segmentation map
The model is biased to produce more paintings images rather than realistic images, as there are a lot of paintings in the training dataset

Training

Training Data This model was trained using a Segmented dataset based on the COYO-700M Dataset. Stable Diffusion v1.5 checkpoint was used as the base model for the controlnet.

The model was trained as follows:

25k steps with the SAM-COYO-2k dataset
28k steps with the SAM-COYO-2.5k dataset
38k steps with the SAM-COYO-3k dataset

In that particular order.

Hardware: Google Cloud TPUv4-8 VM
Optimizer: AdamW
Train Batch Size: 2 x 4 = 8
Learning rate: 0.00001 constant
Gradient Accumulation Steps: 1
Resolution: 512