File size: 16,807 Bytes
fddac4b
 
 
 
 
 
 
 
 
 
 
 
 
 
3128668
b47634b
fddac4b
083cead
c1b8840
b47634b
aa9ed0f
46bfbad
26584dd
 
 
46bfbad
3128668
b47634b
aa9ed0f
46bfbad
3128668
b47634b
e3f1324
46bfbad
e3f1324
3128668
e3f1324
46bfbad
e3f1324
3128668
e3f1324
46bfbad
e3f1324
3128668
e3f1324
46bfbad
e3f1324
3128668
e3f1324
46bfbad
26584dd
 
 
46bfbad
26584dd
 
 
46bfbad
26584dd
 
 
46bfbad
fddac4b
 
 
aa9ed0f
334bc7e
fddac4b
 
 
 
 
 
 
 
aa9ed0f
 
006545f
fddac4b
006545f
 
 
 
 
 
fddac4b
006545f
 
 
96eda5f
6cae953
fddac4b
083cead
96eda5f
6cae953
96eda5f
6cae953
fddac4b
96eda5f
 
 
 
 
fddac4b
96eda5f
 
6cae953
fddac4b
96eda5f
 
 
 
 
6cae953
96eda5f
 
 
fddac4b
331eacc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fddac4b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
331eacc
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
---
license: other
license_name: flux1dev
tags:
- text-to-image
- character
- comic book
- art
- graphic novel
- flux
- flux-diffusers
base_model: black-forest-labs/FLUX.1-dev
instance_prompt: None
widget:
- example_title: "Temple"
  text: Wonderman wearing a black mask is a muscular man in a green and red costume with a 'W' emblem on his chest. He navigates through an ancient temple, carefully avoiding the booby traps set in the stone walls. His muscles are taut as he reaches for an artifact glowing on a pedestal, his face showing a mix of caution and determination. Dust fills the air as he steps closer, high-quality graphic art.
  output:
    url: Wonderman Generation Samples/Wonderman-RunDiffusion-Flux-LoRA_00026_.png
- example_title: "Super Wonderman"
  text: A action scene of graphic novel art character Wonderman wearing a black mask charging himself with super powers, Wonderman a muscular man in a green and red costume with a 'W' emblem on his chest is screaming in pain while being charged by electricity to gain super powers. He is standing with arms wide open consuming the energy around him. high quality graphic art 
  output:
    url: Wonderman Generation Samples/Wonderman-RunDiffusion-Flux-LoRA_00021_.png
- example_title: "In Tundra"
  text: photo realistic still of Wonderman wearing a black mask is a muscular man in a green and red costume with a 'W' emblem on his chest. He stands on a frozen tundra, a blizzard raging around him. His body is covered in frost, but he shows no signs of slowing down as he pushes forward through the snow, his eyes focused on a distant mountain peak where an ancient power is hidden. photo
  output:
    url: Wonderman Generation Samples/Wonderman-RunDiffusion-Flux-LoRA_00036_.png
- example_title: "On Moon"
  text: Wonderman wearing a black mask is a muscular man in a green and red costume with a 'W' emblem on his chest. He is walking away from aliens on the moon, high-quality graphic art.
  output:
    url: Wonderman Generation Samples/Wonderman1-no-workflow.jpg
- example_title: "Fighting"
  text: A photo realistic comic book character Wonderman wearing a black mask fighting a villain. In the foreground, Wonderman a muscular man in a green and red costume with a 'W' emblem on his chest. The background depicts an action scene of all sorts of fighting characters and a dark, cloudy sky. This is a cinematic action scene that is photorealistic similar to cosplay. photograph
  output:
    url: Wonderman Generation Samples/Wonderman-RunDiffusion-Flux-LoRA_00023_.png
- text: Wonderman wearing a black mask is a muscular man in a green and red costume with a 'W' emblem on his chest. He sits in a quiet diner late at night. His mask is still on, but his posture is relaxed as he sips a cup of coffee, watching the rain fall outside. The city is peaceful for now, but Wonderman knows this calm won't last. Modern realistic art style with detailed shading and highlights and high contrast and vivid colors
  example_title: "w/ Coffee"
  output:
    url: Wonderman Generation Samples/Wonderman-RunDiffusion-Flux-LoRA_00045_.png
- text: Wonderman in a black mask a muscular man in a green and red costume with a 'W' emblem on his chest—leaps from a crumbling skyscraper, dodging falling debris while holding a glowing energy sphere in his hand. His black mask is torn, but his face shows fierce determination as he hurls the sphere at an oncoming enemy ship. The sky is filled with smoke and fire from the battle, high-quality graphic art.
  example_title: "Falling"
  output:
    url: Wonderman Generation Samples/Wonderman-RunDiffusion-Flux-LoRA_00070_.png
- text: photo realistic still of Wonderman wearing a black mask is a muscular man in a green and red costume with a 'W' emblem on his chest. He stands on a frozen tundra, a blizzard raging around him. His body is covered in frost, but he shows no signs of slowing down as he pushes forward through the snow, his eyes focused on a distant mountain peak where an ancient power is hidden. photo
  example_title: "In Tundra 2"
  output:
    url: Wonderman Generation Samples/Wonderman-RunDiffusion-Flux-LoRA_00037_.png
- text: Wonderman wearing a black mask is a muscular man in a green and red costume with a 'W' emblem on his chest. He battles a pack of mutant wolves in an abandoned warehouse, his powerful strikes knocking them back one by one. The moonlight filters through broken windows, casting long shadows as Wonderman moves swiftly, his every motion precise and controlled, high-quality graphic art.
  example_title: "Wolves"
  output:
    url: Wonderman Generation Samples/Wonderman-RunDiffusion-Flux-LoRA_00030_.png
- text: Wonderman wearing a black mask is a muscular man in a green and red costume with a 'W' emblem on his chest. He crouches in a rain-soaked alley, muscles tense as thunder rumbles in the background. He grips his glowing energy staff, ready to confront a shadowy figure in the distance. The city lights flicker behind him, high-quality graphic art.
  example_title: "Kneeling"
  output:
    url: Wonderman Generation Samples/Wonderman-RunDiffusion-Flux-LoRA_00020_.png
- text: photo of Wonderman wearing a black mask is a muscular man in a green and red costume with a 'W' emblem on his chest. He stands proudly in front of a massive explosion, framed in the golden hour's soft, warm lighting. His costume is brilliantly contrasted against the fiery background, with the photo perfectly timed to capture the intensity of the scene, high-resolution photograph.
  example_title: "Explosion"
  output:
    url: Wonderman Generation Samples/Wonderman-RunDiffusion-Flux-LoRA_00046_.png
- text: Wonderman wearing a black mask is a muscular man in a green and red costume with a 'W' emblem on his chest. In an action shot, Wonderman speeds through a crowded street, his figure tack sharp while the background is blurred with motion, capturing the sense of speed. The natural light creates a subtle lens flare on his mask in modern realistic art style with detailed shading and highlights and high contrast and vivid colors
  example_title: "Running"
  output:
    url: Wonderman Generation Samples/Wonderman-RunDiffusion-Flux-LoRA_00051_.png
license_link: https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md
---



<div style="display: flex; align-items: center; justify-content: space-between;"> <img src="https://huggingface.co/RunDiffusion/Wonderman-Flux-POC/resolve/main/Huggingface-assets/RD-Logo-dark.jpg" alt="Left Image" style="width: 30%;"> 
<p style="text-align: center; width: 40%;">
  <span style="font-weight: bold; font-size: 1.5em;">Flux Training Concept - Wonderman POC</span><br>
  Darin Holbrook - Chief Technology Officer<br>
  RunDiffusion.com / contact@rundiffusion.com
</p>
 <img src="https://huggingface.co/RunDiffusion/Wonderman-Flux-POC/resolve/main/Huggingface-assets/Wonderman%20Logo.jpg" alt="Right Image" style="width: 20%;"> </div>

<Gallery />

# Wonderman Proof of Concept - By RunDiffusion.com

## For this POC we needed to achieve these goals
- The concept can not exist in the Flux dataset. (This is cheating)
- The concept needed to be present but still allow flexibility for creativity.
- The concept needed to resemble the subject within 90% accuracy.
- The subject could not "take over" the model.
- We used the lowest quality data we could find. (This was easy!)

**We chose Wonderman from 1947!**
Wonderman is in the public domain, so it can be freely shared, except where restricted by Flux's non-commercial license.

Flux thinks that "Wonderman" is "Superman"
![Flux thinks that "Wonderman" is "Superman"](Huggingface-assets/superman-flux.jpg)


## Data Used for Training
You can view the [RAW low quality data here: ](https://huggingface.co/RunDiffusion/Wonderman-Flux-POC/tree/main/Raw%20Low%20Quality%20Data).
The training data was low resolution, cropped, oddly shaped, pixelated, and overall the worst possible data we've come across. That didn't stop us! AI to the rescue!
![Low Quality Training Data](Huggingface-assets/multiple-samples-training-data.png)

To fix the data we had to:
- Inpaint problem areas like backgrounds, signatures, and text
- Outpaint to expand images
- Upscale to get above 1024x1024 at a minimum
- Create variations to increase the dataset and provide diverse data

We were able to get the dataset to 13 with these techniques.
Full dataset [is here](https://huggingface.co/RunDiffusion/Wonderman-Flux-POC/tree/main/Cleaned%20and%20Captioned%20Data)
![Cleaned Wonderman Dataset](Huggingface-assets/multiple-samples-of-cleaned-data.png)

### Captioning the Data
We are not entirely familiar with Flux's preferred captioning style. We understand that this model responds will to full descriptive sentences so we went with that. Below are some examples of the images with their captions. We chose LLaMA v3 inspired by this paper: https://arxiv.org/html/2406.08478v1
The system prompt used was basic and could likely benefit from further refinement.

A vintage comic book cover of Wonderman. On the cover, there are three main characters: Wonderman in a green costume with a large 'W' on his chest, a woman in a yellow and black outfit, and a smaller figure in a brown costume. Wonderman and the woman appear to be in a dynamic pose, suggesting action or combat. Wonderman is holding a thin, sharp object, possibly a weapon. The woman has a confident expression and is looking towards the viewer. The background is a mix of green and yellow, with some abstract designs.
![Vintage Wonderman](Cleaned and Captioned Data/00008.png)

Wonderman, a male superhero character. He is wearing a green and red costume with a large 'W' emblem on the chest. Wonderman has a muscular physique, brown hair, and is wearing a black mask covering his eyes. He stands confidently with his hands by his sides. photo
![Standing Wonderman](https://huggingface.co/RunDiffusion/Wonderman-Flux-POC/resolve/main/Cleaned%20and%20Captioned%20Data/00002.png)

### Train the Data
All tasks were performed on a local workstation equipped with an RTX 4090, i7 processor, and 64GB RAM. Note that 32GB RAM will not suffice, as you may encounter out-of-memory (OOM) errors when caching latents. We did use RunDiffusion.com for testing the LoRAs created, enabling us to launch five servers with five checkpoints to determine the best one that converged
We're not going to dive into the rank and learning rate and stuff because this really depends on your goals and what you're trying to accomplish. But the rules below are good ones to follow.
- We used Ostris's ai-toolkit available here: https://github.com/ostris/ai-toolkit/tree/main
- Default config with LR: 4e-4 at Rank 16
- 2200 - 2600 steps saw good convergence. Even some checkpoints into the 4k step range turned out pretty good.
If targeting finer details, you may want to adjust the rank up to 32 and lower the learning rate. You will also need to run more steps if you do this.
**Training a style:** Using simple captions with clear examples to maintain a coherent style is crucial. Although caption-less LoRAs can sometimes work for styles, this was not within the scope of our goals, so we cannot provide specific insights.
**Training a concept:** You can choose either descriptive captions to avoid interfering with existing tokens or general captions that might interfere, depending on your intention. This choice should be intentional.

Captioning has never been more critical. Flux "gives you what you ask for" - and that's a good thing. You can train a LoRA on a single cartoon concept and still generate photo realistic people. You can even caption a cartoon in the foreground and a realistic scene in the background! This capability is BY DESIGN - so do not resist it - embrace it! (Spoiler alert next!)
![prompt different backgrounds]()
You'll see in the next page of examples where the captioning really helps or hurts you. Depending on your goals again you will need to choose the path that fits what you're trying to accomplish.
Total time for the LoRA was about 2 to 2.5 hours. $1 to $2 on RunPod, Vast, or local electricity will be even cheaper.
Now for the results! (This next file is big to preserve the quality)

## 500 Steps
![500 steps](Huggingface-assets/500-steps.jpg)

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

### Direct Use

<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->

[More Information Needed]

### Downstream Use [optional]

<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->

[More Information Needed]

### Out-of-Scope Use

<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->

[More Information Needed]

## Bias, Risks, and Limitations

<!-- This section is meant to convey both technical and sociotechnical limitations. -->

[More Information Needed]

### Recommendations

<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

## How to Get Started with the Model

Use the code below to get started with the model.

[More Information Needed]

## Training Details

### Training Data

<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->

[More Information Needed]

### Training Procedure

<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->

#### Preprocessing [optional]

[More Information Needed]


#### Training Hyperparameters

- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->

#### Speeds, Sizes, Times [optional]

<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->

[More Information Needed]

## Evaluation

<!-- This section describes the evaluation protocols and provides the results. -->

### Testing Data, Factors & Metrics

#### Testing Data

<!-- This should link to a Dataset Card if possible. -->

[More Information Needed]

#### Factors

<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->

[More Information Needed]

#### Metrics

<!-- These are the evaluation metrics being used, ideally with a description of why. -->

[More Information Needed]

### Results

[More Information Needed]

#### Summary



## Model Examination [optional]

<!-- Relevant interpretability work for the model goes here -->

[More Information Needed]

## Environmental Impact

<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->

Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

- **Hardware Type:** [More Information Needed]
- **Hours used:** [More Information Needed]
- **Cloud Provider:** [More Information Needed]
- **Compute Region:** [More Information Needed]
- **Carbon Emitted:** [More Information Needed]

## Technical Specifications [optional]

### Model Architecture and Objective

[More Information Needed]

### Compute Infrastructure

[More Information Needed]

#### Hardware

[More Information Needed]

#### Software

[More Information Needed]

## Citation [optional]

<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->

**BibTeX:**

[More Information Needed]

**APA:**

[More Information Needed]

## Glossary [optional]

<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->

[More Information Needed]

## More Information [optional]

[More Information Needed]

## Model Card Authors [optional]

[More Information Needed]

## Model Card Contact

[More Information Needed]


- **Developed by:** Darin Holbrook - RunDiffusion co-founder and Chief Technology Officer
- **Funded by:** RunDiffusion.com / RunPod.io
- **Model type:** Flux [dev] LoRA
- **License:** flux1dev https://huggingface.co/black-forest-labs/FLUX.1-dev
- **Finetuned from model:** https://huggingface.co/black-forest-labs/FLUX.1-dev