mfidabel's picture
Added limitation and biases
d528dbe
|
raw
history blame
2.31 kB
metadata
license: creativeml-openrail-m
base_model: runwayml/stable-diffusion-v1-5
tags:
  - stable-diffusion
  - stable-diffusion-diffusers
  - text-to-image
  - diffusers
  - controlnet
  - jax-diffusers-event
inference: true
datasets:
  - mfidabel/sam-coyo-2k
  - mfidabel/sam-coyo-2.5k
  - mfidabel/sam-coyo-3k
language:
  - en
library_name: diffusers

ControlNet - mfidabel/controlnet-segment-anything

These are controlnet weights trained on runwayml/stable-diffusion-v1-5 with a new type of conditioning. You can find some example images in the following.

prompt: contemporary living room of a house

negative prompt: low quality images_0)

prompt: new york buildings, Vincent Van Gogh starry night

negative prompt: low quality, monochrome images_1)

prompt: contemporary living room, high quality, 4k, realistic

negative prompt: low quality, monochrome, low res images_2)

Limitations and Bias

  • The model can't render text
  • Landscapes with fewer segments tend to render better
  • Some segmentation maps tend to render in monochrome (use a negative_prompt to get around it)
  • Some generated images can be over saturated
  • Shorter prompts usually work better, as long as it makes sense with the input segmentation map
  • The model is biased to produce more paintings images rather than realistic images, as there are a lot of paintings in the training dataset

Training

Training Data This model was trained using a Segmented dataset based on the COYO-700M Dataset. Stable Diffusion v1.5 checkpoint was used as the base model for the controlnet.

The model was trained as follows:

In that particular order.

  • Hardware: Google Cloud TPUv4-8 VM

  • Optimizer: AdamW

  • Train Batch Size: 2 x 4 = 8

  • Learning rate: 0.00001 constant

  • Gradient Accumulation Steps: 1

  • Resolution: 512