valhalla's picture
Create README.md
662f109
|
raw
history blame
No virus
4.99 kB
metadata
license: apache-2.0
base_model: stabilityai/stable-diffusion-xl-base-1.0
tags:
  - art
  - t2i-adapter
  - stable-diffusion
  - image-to-image

T2I-Adapter-SDXL - Lineart

T2I Adapter is a network providing additional conditioning to stable diffusion. Each t2i checkpoint takes a different type of conditioning as input and is used with a specific base stable diffusion checkpoint.

This checkpoint provides conditioning on canny for the StableDiffusionXL checkpoint.

Model Details

  • Developed by: T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models

  • Model type: Diffusion-based text-to-image generation model

  • Language(s): English

  • License: Apache 2.0

  • Resources for more information: GitHub Repository, Paper.

  • Cite as:

    @misc{ title={T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models}, author={Chong Mou, Xintao Wang, Liangbin Xie, Yanze Wu, Jian Zhang, Zhongang Qi, Ying Shan, Xiaohu Qie}, year={2023}, eprint={2302.08453}, archivePrefix={arXiv}, primaryClass={cs.CV} }

Checkpoints

Model Name Control Image Overview Control Image Example Generated Image Example
Adapter/t2iadapter_canny_sdxlv1
Trained with canny edge detection
A monochrome image with white edges on a black background.
Adapter/t2iadapter_sketch_sdxlv1
Trained with PidiNet edge detection
A hand-drawn monochrome image with white outlines on a black background.
Adapter/t2iadapter_depth_sdxlv1
Trained with Midas depth estimation
A grayscale image with black representing deep areas and white representing shallow areas.
Adapter/t2iadapter_openpose_sdxlv1
Trained with OpenPose bone image
A OpenPose bone image.

Example

To get started, first install the required dependencies:

pip install git+https://github.com/huggingface/diffusers.git@t2iadapterxl # for now
pip install git+https://github.com/patrickvonplaten/controlnet_aux.git # for conditioning models and detectors  
pip install transformers accelerate safetensors
  1. Images are first downloaded into the appropriate control image format.
  2. The control image and prompt are passed to the StableDiffusionXLAdapterPipeline.

Let's have a look at a simple example using the Canny Adapter.

from diffusers import StableDiffusionXLAdapterPipeline, T2IAdapter, EulerAncestralDiscreteScheduler
from diffusers.utils import load_image, make_image_grid
from controlnet_aux.lineart import LineartDetector

# load adapter
adapter = T2IAdapter.from_pretrained(
  "Adapter/t2i-adapter-lineart-sdxl-1.0", torch_dtype=torch.float16, varient="fp16"
).to("cuda")

# load euler_a scheduler
model_id = 'stabilityai/stable-diffusion-xl-base-1.0'
euler_a = EulerAncestralDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler")
vae= AutoencoderKL.from_pretrained(
  "madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16
)
pipe = StableDiffusionXLAdapterPipeline.from_pretrained(
  model_id, vae=vae, adapter=adapter, scheduler=euler_a, torch_dtype=torch.float16, variant="fp16", 
).to("cuda")
pipe.enable_xformers_memory_efficient_attention()

# Load PidiNet
line_detector = LineartDetector.from_pretrained("lllyasviel/Annotators").to("cuda")

url = "https://cdn.sortiraparis.com/images/80/77381/729517-oppenheimer-le-prochain-film-de-christopher-nolan-pour-2023-la-premiere-photo.jpg"
image = load_image(url)
image = line_detector(
    image, detect_resolution=384, image_resolution=1024
).resize((1024, 1024))

prompt = "cinematic still, a man, head shot"
negative_prompt = "anime, cartoon, graphic, text, painting, crayon, graphite, abstract, glitch, deformed, mutated, ugly, disfigured"

gen_images = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    image=image,
    num_inference_steps=30,
    adapter_conditioning_scale=1, 
    cond_tau=1
).images
gen_images[0]