# Stable Diffusion 3 Inpainting Pipeline This is the implementation of `Stable Diffusion 3 Inpainting Pipeline`. | input image | input mask image | output | |:-------------------------:|:-------------------------:|:-------------------------:| | | | | | | | | | | | | **Please ensure that the version of diffusers >= 0.29.1** ## Model [Stable Diffusion 3 Medium](https://stability.ai/news/stable-diffusion-3-medium) is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features greatly improved performance in image quality, typography, complex prompt understanding, and resource-efficiency. For more technical details, please refer to the [Research paper](https://stability.ai/news/stable-diffusion-3-research-paper). Please note: this model is released under the Stability Non-Commercial Research Community License. For a Creator License or an Enterprise License visit Stability.ai or [contact us](https://stability.ai/license) for commercial licensing details. ### Model Description - **Developed by:** Stability AI - **Model type:** MMDiT text-to-image generative model - **Model Description:** This is a model that can be used to generate images based on text prompts. It is a Multimodal Diffusion Transformer (https://arxiv.org/abs/2403.03206) that uses three fixed, pretrained text encoders ([OpenCLIP-ViT/G](https://github.com/mlfoundations/open_clip), [CLIP-ViT/L](https://github.com/openai/CLIP/tree/main) and [T5-xxl](https://huggingface.co/google/t5-v1_1-xxl)) ## Demo Make sure you upgrade to the latest version of diffusers: pip install -U diffusers. And then you can run: ```python import torch from torchvision import transforms from pipeline_stable_diffusion_3_inpaint import StableDiffusion3InpaintPipeline from diffusers.utils import load_image def preprocess_image(image): image = image.convert("RGB") image = transforms.CenterCrop((image.size[1] // 64 * 64, image.size[0] // 64 * 64))(image) image = transforms.ToTensor()(image) image = image.unsqueeze(0).to("cuda") return image def preprocess_mask(mask): mask = mask.convert("L") mask = transforms.CenterCrop((mask.size[1] // 64 * 64, mask.size[0] // 64 * 64))(mask) mask = transforms.ToTensor()(mask) mask = mask.to("cuda") return mask pipe = StableDiffusion3InpaintPipeline.from_pretrained( "stabilityai/stable-diffusion-3-medium-diffusers", torch_dtype=torch.float16, ).to("cuda") prompt = "Face of a yellow cat, high resolution, sitting on a park bench" source_image = load_image( "./overture-creations-5sI6fQgYIuo.png" ) source = preprocess_image(source_image) mask = preprocess_mask( load_image( "./overture-creations-5sI6fQgYIuo_mask.png" ) ) image = pipe( prompt=prompt, image=source, mask_image=mask, height=1024, width=1024, num_inference_steps=50, guidance_scale=7.0, strength=0.6, ).images[0] image.save("overture-creations-5sI6fQgYIuo_output.jpg") ```