metadata

license: openrail++
tags:
  - text-to-image
  - stable-diffusion

Overview

SDXL-512 is a checkpoint fine-tuned from SDXL 1.0 that is designed to more simply generate higher-fidelity images at and around the 512x512 resolution. The model has been fine-tuned using a learning rate of 1e-6 over 7000 steps with a batch size of 64 on a curated dataset of multiple aspect ratios. alternating low and high resolution batches (per aspect ratio) so as not to impair the base model's existing performance at higher resolution.

Note: It bears repeating that SDXL-512 was not trained to be "better" than SDXL, but rather to simplify prompting for higher-fidelity outputs at and around the 512x512 resolution.

Use it with Hotshot-XL (recommended)

Model Description

Developed by: Natural Synthetics Inc.
Model type: Diffusion-based text-to-image generative model
License: CreativeML Open RAIL++-M License
Model Description: This is a model that can be used to generate and modify higher-fidelity images at and around the 512x512 resolution.
Resources for more information: Check out our GitHub Repository.
Finetuned from model: Stable Diffusion XL 1.0

🧨 Diffusers

Make sure to upgrade diffusers to >= 0.18.2:

pip install diffusers --upgrade

In addition make sure to install transformers, safetensors, accelerate as well as the invisible watermark:

pip install invisible_watermark transformers accelerate safetensors

Running the pipeline (if you don't swap the scheduler it will run with the default EulerDiscreteScheduler in this example we are swapping it to EulerAncestralDiscreteScheduler:

import torch
from torch import autocast
from diffusers import StableDiffusionXLPipeline, EulerAncestralDiscreteScheduler
model = "hotshotco/SDXL-512"
pipe = StableDiffusionXLPipeline.from_pretrained(
    model, 
    torch_dtype=torch.float16, 
    use_safetensors=True, 
    variant="fp16"
    )
pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)
pipe.to('cuda')
prompt = "a woman laughing"
negative_prompt = ""
image = pipe(
    prompt, 
    negative_prompt=negative_prompt, 
    width=512,
    height=512,
    guidance_scale=12,
    target_size=(1024,1024),
    original_size=(4096,4096),
    num_inference_steps=50
    ).images[0]
image.save("woman_laughing.png")

Limitations and Bias

Limitations

The model does not achieve perfect photorealism
The model cannot render legible text
The model struggles with more difficult tasks which involve compositionality, such as rendering an image corresponding to “A red cube on top of a blue sphere”
Faces and people in general may not be generated properly.

Bias

While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases.