IrohXu commited on
Commit
5711f36
2 Parent(s): 9729d10 e189fcb

Merge branch 'main' of https://huggingface.co/IrohXu/stable-diffusion-3-inpainting into main

Browse files
Files changed (1) hide show
  1. README.md +24 -3
README.md CHANGED
@@ -1,4 +1,6 @@
1
- # Stable Diffusion 3 Inpaint Pipeline
 
 
2
 
3
  | input image | input mask image | output |
4
  |:-------------------------:|:-------------------------:|:-------------------------:|
@@ -8,7 +10,27 @@
8
 
9
  **Please ensure that the version of diffusers >= 0.29.1**
10
 
11
- # Demo
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  ```python
13
  import torch
14
  from torchvision import transforms
@@ -20,7 +42,6 @@ def preprocess_image(image):
20
  image = image.convert("RGB")
21
  image = transforms.CenterCrop((image.size[1] // 64 * 64, image.size[0] // 64 * 64))(image)
22
  image = transforms.ToTensor()(image)
23
- image = image * 2 - 1
24
  image = image.unsqueeze(0).to("cuda")
25
  return image
26
 
 
1
+ # Stable Diffusion 3 Inpainting Pipeline
2
+
3
+ This is the implementation of `Stable Diffusion 3 Inpainting Pipeline`.
4
 
5
  | input image | input mask image | output |
6
  |:-------------------------:|:-------------------------:|:-------------------------:|
 
10
 
11
  **Please ensure that the version of diffusers >= 0.29.1**
12
 
13
+ ## Model
14
+
15
+ [Stable Diffusion 3 Medium](https://stability.ai/news/stable-diffusion-3-medium) is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features greatly improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.
16
+
17
+ For more technical details, please refer to the [Research paper](https://stability.ai/news/stable-diffusion-3-research-paper).
18
+
19
+ Please note: this model is released under the Stability Non-Commercial Research Community License. For a Creator License or an Enterprise License visit Stability.ai or [contact us](https://stability.ai/license) for commercial licensing details.
20
+
21
+
22
+ ### Model Description
23
+
24
+ - **Developed by:** Stability AI
25
+ - **Model type:** MMDiT text-to-image generative model
26
+ - **Model Description:** This is a model that can be used to generate images based on text prompts. It is a Multimodal Diffusion Transformer
27
+ (https://arxiv.org/abs/2403.03206) that uses three fixed, pretrained text encoders
28
+ ([OpenCLIP-ViT/G](https://github.com/mlfoundations/open_clip), [CLIP-ViT/L](https://github.com/openai/CLIP/tree/main) and [T5-xxl](https://huggingface.co/google/t5-v1_1-xxl))
29
+
30
+ ## Demo
31
+
32
+ Make sure you upgrade to the latest version of diffusers: pip install -U diffusers. And then you can run:
33
+
34
  ```python
35
  import torch
36
  from torchvision import transforms
 
42
  image = image.convert("RGB")
43
  image = transforms.CenterCrop((image.size[1] // 64 * 64, image.size[0] // 64 * 64))(image)
44
  image = transforms.ToTensor()(image)
 
45
  image = image.unsqueeze(0).to("cuda")
46
  return image
47