mfidabel
/

controlnet-segment-anything

stable-diffusion

stable-diffusion-diffusers

jax-diffusers-event

Model card Files Files and versions Community

controlnet-segment-anything / README.md

mfidabel's picture

Added limitation and biases

d528dbe over 1 year ago

|

2.31 kB

	---
	license: creativeml-openrail-m
	base_model: runwayml/stable-diffusion-v1-5
	tags:
	- stable-diffusion
	- stable-diffusion-diffusers
	- text-to-image
	- diffusers
	- controlnet
	- jax-diffusers-event
	inference: true
	datasets:
	- mfidabel/sam-coyo-2k
	- mfidabel/sam-coyo-2.5k
	- mfidabel/sam-coyo-3k
	language:
	- en
	library_name: diffusers
	---

	# ControlNet - mfidabel/controlnet-segment-anything

	These are controlnet weights trained on runwayml/stable-diffusion-v1-5 with a new type of conditioning. You can find some example images in the following.

	prompt: contemporary living room of a house

	negative prompt: low quality
	![images_0)](./images_0.png)

	prompt: new york buildings, Vincent Van Gogh starry night

	negative prompt: low quality, monochrome
	![images_1)](./images_1.png)

	prompt: contemporary living room, high quality, 4k, realistic

	negative prompt: low quality, monochrome, low res
	![images_2)](./images_2.png)

	## Limitations and Bias

	- The model can't render text
	- Landscapes with fewer segments tend to render better
	- Some segmentation maps tend to render in monochrome (use a negative_prompt to get around it)
	- Some generated images can be over saturated
	- Shorter prompts usually work better, as long as it makes sense with the input segmentation map
	- The model is biased to produce more paintings images rather than realistic images, as there are a lot of paintings in the training dataset

	## Training

	Training Data This model was trained using a Segmented dataset based on the [COYO-700M Dataset](https://huggingface.co/datasets/kakaobrain/coyo-700m).
	[Stable Diffusion v1.5](https://huggingface.co/runwayml/stable-diffusion-v1-5) checkpoint was used as the base model for the controlnet.


	The model was trained as follows:

	- 25k steps with the [SAM-COYO-2k](https://huggingface.co/datasets/mfidabel/sam-coyo-2k) dataset
	- 28k steps with the [SAM-COYO-2.5k](https://huggingface.co/datasets/mfidabel/sam-coyo-2.5k) dataset
	- 38k steps with the [SAM-COYO-3k](https://huggingface.co/datasets/mfidabel/sam-coyo-3k) dataset

	In that particular order.

	- Hardware: Google Cloud TPUv4-8 VM

	- Optimizer: AdamW

	- Train Batch Size: 2 x 4 = 8

	- Learning rate: 0.00001 constant

	- Gradient Accumulation Steps: 1

	- Resolution: 512