File size: 2,310 Bytes
9e2e766
 
 
 
 
 
 
 
 
 
 
292dece
 
 
 
 
 
 
9e2e766
 
5566ba7
9e2e766
5566ba7
9e2e766
880c34c
 
 
9e2e766
880c34c
 
 
 
9e2e766
880c34c
 
 
 
5566ba7
 
d528dbe
 
 
 
 
 
 
 
 
5566ba7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
---
license: creativeml-openrail-m
base_model: runwayml/stable-diffusion-v1-5
tags:
- stable-diffusion
- stable-diffusion-diffusers
- text-to-image
- diffusers
- controlnet
- jax-diffusers-event
inference: true
datasets:
- mfidabel/sam-coyo-2k
- mfidabel/sam-coyo-2.5k
- mfidabel/sam-coyo-3k
language:
- en
library_name: diffusers
---
    
# ControlNet - mfidabel/controlnet-segment-anything

These are controlnet weights trained on runwayml/stable-diffusion-v1-5 with a new type of conditioning. You can find some example images in the following. 

**prompt**: contemporary living room of a house

**negative prompt**: low quality
![images_0)](./images_0.png)

**prompt**: new york buildings,  Vincent Van Gogh starry night 

**negative prompt**: low quality, monochrome
![images_1)](./images_1.png)

**prompt**: contemporary living room,  high quality, 4k, realistic

**negative prompt**: low quality, monochrome, low res
![images_2)](./images_2.png)

## Limitations and Bias

- The model can't render text
- Landscapes with fewer segments tend to render better
- Some segmentation maps tend to render in monochrome (use a negative_prompt to get around it)
- Some generated images can be over saturated
- Shorter prompts usually work better, as long as it makes sense with the input segmentation map
- The model is biased to produce more paintings images rather than realistic images, as there are a lot of paintings in the training dataset

## Training

**Training Data** This model was trained using a Segmented dataset based on the [COYO-700M Dataset](https://huggingface.co/datasets/kakaobrain/coyo-700m). 
[Stable Diffusion v1.5](https://huggingface.co/runwayml/stable-diffusion-v1-5) checkpoint was used as the base model for the controlnet.


The model was trained as follows:

- 25k steps with the [SAM-COYO-2k](https://huggingface.co/datasets/mfidabel/sam-coyo-2k) dataset
- 28k steps with the [SAM-COYO-2.5k](https://huggingface.co/datasets/mfidabel/sam-coyo-2.5k) dataset
- 38k steps with the [SAM-COYO-3k](https://huggingface.co/datasets/mfidabel/sam-coyo-3k) dataset

In that particular order.

- **Hardware**: Google Cloud TPUv4-8 VM

- **Optimizer**: AdamW

- **Train Batch Size**: 2 x 4 = 8

- **Learning rate**: 0.00001 constant

- **Gradient Accumulation Steps**: 1

- **Resolution**: 512