hysts HF staff commited on
Commit
0b34766
1 Parent(s): 5c5cb40

Apply formatter

Browse files
Files changed (6) hide show
  1. .pre-commit-config.yaml +55 -0
  2. .vscode/settings.json +21 -0
  3. LICENSE +1 -1
  4. README.md +8 -11
  5. edit_app.py +15 -8
  6. requirements.txt +2 -2
.pre-commit-config.yaml ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ repos:
2
+ - repo: https://github.com/pre-commit/pre-commit-hooks
3
+ rev: v4.4.0
4
+ hooks:
5
+ - id: check-executables-have-shebangs
6
+ - id: check-json
7
+ - id: check-merge-conflict
8
+ - id: check-shebang-scripts-are-executable
9
+ - id: check-toml
10
+ - id: check-yaml
11
+ - id: end-of-file-fixer
12
+ - id: mixed-line-ending
13
+ args: ["--fix=lf"]
14
+ - id: requirements-txt-fixer
15
+ - id: trailing-whitespace
16
+ - repo: https://github.com/myint/docformatter
17
+ rev: v1.7.5
18
+ hooks:
19
+ - id: docformatter
20
+ args: ["--in-place"]
21
+ - repo: https://github.com/pycqa/isort
22
+ rev: 5.12.0
23
+ hooks:
24
+ - id: isort
25
+ args: ["--profile", "black"]
26
+ - repo: https://github.com/pre-commit/mirrors-mypy
27
+ rev: v1.5.1
28
+ hooks:
29
+ - id: mypy
30
+ args: ["--ignore-missing-imports"]
31
+ additional_dependencies:
32
+ ["types-python-slugify", "types-requests", "types-PyYAML"]
33
+ - repo: https://github.com/psf/black
34
+ rev: 23.9.1
35
+ hooks:
36
+ - id: black
37
+ language_version: python3.10
38
+ args: ["--line-length", "119"]
39
+ - repo: https://github.com/kynan/nbstripout
40
+ rev: 0.6.1
41
+ hooks:
42
+ - id: nbstripout
43
+ args:
44
+ [
45
+ "--extra-keys",
46
+ "metadata.interpreter metadata.kernelspec cell.metadata.pycharm",
47
+ ]
48
+ - repo: https://github.com/nbQA-dev/nbQA
49
+ rev: 1.7.0
50
+ hooks:
51
+ - id: nbqa-black
52
+ - id: nbqa-pyupgrade
53
+ args: ["--py37-plus"]
54
+ - id: nbqa-isort
55
+ args: ["--float-to-top"]
.vscode/settings.json ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "[python]": {
3
+ "editor.defaultFormatter": "ms-python.black-formatter",
4
+ "editor.formatOnType": true,
5
+ "editor.codeActionsOnSave": {
6
+ "source.organizeImports": true
7
+ }
8
+ },
9
+ "black-formatter.args": [
10
+ "--line-length=119"
11
+ ],
12
+ "isort.args": ["--profile", "black"],
13
+ "flake8.args": [
14
+ "--max-line-length=119"
15
+ ],
16
+ "ruff.args": [
17
+ "--line-length=119"
18
+ ],
19
+ "editor.formatOnSave": true,
20
+ "files.insertFinalNewline": true
21
+ }
LICENSE CHANGED
@@ -6,4 +6,4 @@ The above copyright notice and this permission notice shall be included in all c
6
 
7
  THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
8
 
9
- Portions of code and models (such as pretrained checkpoints, which are fine-tuned starting from released Stable Diffusion checkpoints) are derived from the Stable Diffusion codebase (https://github.com/CompVis/stable-diffusion). Further restrictions may apply. Please consult the Stable Diffusion license `stable_diffusion/LICENSE`. Modified code is denoted as such in comments at the start of each file.
 
6
 
7
  THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
8
 
9
+ Portions of code and models (such as pretrained checkpoints, which are fine-tuned starting from released Stable Diffusion checkpoints) are derived from the Stable Diffusion codebase (https://github.com/CompVis/stable-diffusion). Further restrictions may apply. Please consult the Stable Diffusion license `stable_diffusion/LICENSE`. Modified code is denoted as such in comments at the start of each file.
README.md CHANGED
@@ -10,16 +10,16 @@ pinned: false
10
  ### [Project Page](https://www.timothybrooks.com/instruct-pix2pix/) | [Paper](https://arxiv.org/abs/2211.09800) | [Data](http://instruct-pix2pix.eecs.berkeley.edu/)
11
  PyTorch implementation of InstructPix2Pix, an instruction-based image editing model, based on the original [CompVis/stable_diffusion](https://github.com/CompVis/stable-diffusion) repo. <br>
12
 
13
- [InstructPix2Pix: Learning to Follow Image Editing Instructions](https://www.timothybrooks.com/instruct-pix2pix/)
14
  [Tim Brooks](https://www.timothybrooks.com/)\*,
15
  [Aleksander Holynski](https://holynski.org/)\*,
16
  [Alexei A. Efros](https://people.eecs.berkeley.edu/~efros/) <br>
17
  UC Berkeley <br>
18
- \*denotes equal contribution
19
-
20
  <img src='https://instruct-pix2pix.timothybrooks.com/teaser.jpg'/>
21
 
22
- ## TL;DR: quickstart
23
 
24
  Set up a conda environment, and download a pretrained model:
25
  ```
@@ -38,7 +38,7 @@ python edit_cli.py --input imgs/example.jpg --output imgs/output.jpg --edit "tur
38
 
39
  Or launch your own interactive editing Gradio app:
40
  ```
41
- python edit_app.py
42
  ```
43
  ![Edit app](https://github.com/timothybrooks/instruct-pix2pix/blob/main/imgs/edit_app.jpg?raw=true)
44
 
@@ -80,9 +80,9 @@ InstructPix2Pix is trained by fine-tuning from an initial StableDiffusion checkp
80
  ```
81
  bash scripts/download_pretrained_sd.sh
82
  ```
83
- If you'd like to use a different checkpoint, point to it in the config file `configs/train.yaml`, on line 8, after `ckpt_path:`.
84
 
85
- Next, we need to change the config to point to our downloaded (or generated) dataset. If you're using the `clip-filtered-dataset` from above, you can skip this. Otherwise, you may need to edit lines 85 and 94 of the config (`data.params.train.params.path`, `data.params.validation.params.path`).
86
 
87
  Finally, start a training job with the following command:
88
 
@@ -101,7 +101,7 @@ We provide our generated dataset of captions and edit instructions [here](https:
101
 
102
  #### (1.1) Manually write a dataset of instructions and captions
103
 
104
- The first step of the process is fine-tuning GPT-3. To do this, we made a dataset of 700 examples broadly covering of edits that we might want our model to be able to perform. Our examples are available [here](https://instruct-pix2pix.eecs.berkeley.edu/human-written-prompts.jsonl). These should be diverse and cover a wide range of possible captions and types of edits. Ideally, they should avoid duplication or significant overlap of captions and instructions. It is also important to be mindful of limitations of Stable Diffusion and Prompt-to-Prompt in writing these examples, such as inability to perform large spatial transformations (e.g., moving the camera, zooming in, swapping object locations).
105
 
106
  Input prompts should closely match the distribution of input prompts used to generate the larger dataset. We sampled the 700 input prompts from the _LAION Improved Aesthetics 6.5+_ dataset and also use this dataset for generating examples. We found this dataset is quite noisy (many of the captions are overly long and contain irrelevant text). For this reason, we also considered MSCOCO and LAION-COCO datasets, but ultimately chose _LAION Improved Aesthetics 6.5+_ due to its diversity of content, proper nouns, and artistic mediums. If you choose to use another dataset or combination of datasets as input to GPT-3 when generating examples, we recommend you sample the input prompts from the same distribution when manually writing training examples.
107
 
@@ -211,6 +211,3 @@ If you're not getting the quality result you want, there may be a few reasons:
211
  year={2022}
212
  }
213
  ```
214
-
215
-
216
-
 
10
  ### [Project Page](https://www.timothybrooks.com/instruct-pix2pix/) | [Paper](https://arxiv.org/abs/2211.09800) | [Data](http://instruct-pix2pix.eecs.berkeley.edu/)
11
  PyTorch implementation of InstructPix2Pix, an instruction-based image editing model, based on the original [CompVis/stable_diffusion](https://github.com/CompVis/stable-diffusion) repo. <br>
12
 
13
+ [InstructPix2Pix: Learning to Follow Image Editing Instructions](https://www.timothybrooks.com/instruct-pix2pix/)
14
  [Tim Brooks](https://www.timothybrooks.com/)\*,
15
  [Aleksander Holynski](https://holynski.org/)\*,
16
  [Alexei A. Efros](https://people.eecs.berkeley.edu/~efros/) <br>
17
  UC Berkeley <br>
18
+ \*denotes equal contribution
19
+
20
  <img src='https://instruct-pix2pix.timothybrooks.com/teaser.jpg'/>
21
 
22
+ ## TL;DR: quickstart
23
 
24
  Set up a conda environment, and download a pretrained model:
25
  ```
 
38
 
39
  Or launch your own interactive editing Gradio app:
40
  ```
41
+ python edit_app.py
42
  ```
43
  ![Edit app](https://github.com/timothybrooks/instruct-pix2pix/blob/main/imgs/edit_app.jpg?raw=true)
44
 
 
80
  ```
81
  bash scripts/download_pretrained_sd.sh
82
  ```
83
+ If you'd like to use a different checkpoint, point to it in the config file `configs/train.yaml`, on line 8, after `ckpt_path:`.
84
 
85
+ Next, we need to change the config to point to our downloaded (or generated) dataset. If you're using the `clip-filtered-dataset` from above, you can skip this. Otherwise, you may need to edit lines 85 and 94 of the config (`data.params.train.params.path`, `data.params.validation.params.path`).
86
 
87
  Finally, start a training job with the following command:
88
 
 
101
 
102
  #### (1.1) Manually write a dataset of instructions and captions
103
 
104
+ The first step of the process is fine-tuning GPT-3. To do this, we made a dataset of 700 examples broadly covering of edits that we might want our model to be able to perform. Our examples are available [here](https://instruct-pix2pix.eecs.berkeley.edu/human-written-prompts.jsonl). These should be diverse and cover a wide range of possible captions and types of edits. Ideally, they should avoid duplication or significant overlap of captions and instructions. It is also important to be mindful of limitations of Stable Diffusion and Prompt-to-Prompt in writing these examples, such as inability to perform large spatial transformations (e.g., moving the camera, zooming in, swapping object locations).
105
 
106
  Input prompts should closely match the distribution of input prompts used to generate the larger dataset. We sampled the 700 input prompts from the _LAION Improved Aesthetics 6.5+_ dataset and also use this dataset for generating examples. We found this dataset is quite noisy (many of the captions are overly long and contain irrelevant text). For this reason, we also considered MSCOCO and LAION-COCO datasets, but ultimately chose _LAION Improved Aesthetics 6.5+_ due to its diversity of content, proper nouns, and artistic mediums. If you choose to use another dataset or combination of datasets as input to GPT-3 when generating examples, we recommend you sample the input prompts from the same distribution when manually writing training examples.
107
 
 
211
  year={2022}
212
  }
213
  ```
 
 
 
edit_app.py CHANGED
@@ -5,9 +5,8 @@ import random
5
 
6
  import gradio as gr
7
  import torch
8
- from PIL import Image, ImageOps
9
  from diffusers import StableDiffusionInstructPix2PixPipeline
10
-
11
 
12
  help_text = """
13
  If you're not getting what you want, there may be a few reasons:
@@ -46,8 +45,11 @@ example_instructions = [
46
 
47
  model_id = "timbrooks/instruct-pix2pix"
48
 
 
49
  def main():
50
- pipe = StableDiffusionInstructPix2PixPipeline.from_pretrained(model_id, torch_dtype=torch.float16, safety_checker=None).to("cuda")
 
 
51
  example_image = Image.open("imgs/example.jpg").convert("RGB")
52
 
53
  def load_example(
@@ -96,9 +98,12 @@ def main():
96
 
97
  generator = torch.manual_seed(seed)
98
  edited_image = pipe(
99
- instruction, image=input_image,
100
- guidance_scale=text_cfg_scale, image_guidance_scale=image_cfg_scale,
101
- num_inference_steps=steps, generator=generator,
 
 
 
102
  ).images[0]
103
  return [seed, text_cfg_scale, image_cfg_scale, edited_image]
104
 
@@ -106,14 +111,16 @@ def main():
106
  return [0, "Randomize Seed", 1371, "Fix CFG", 7.5, 1.5, None]
107
 
108
  with gr.Blocks() as demo:
109
- gr.HTML("""<h1 style="font-weight: 900; margin-bottom: 7px;">
 
110
  InstructPix2Pix: Learning to Follow Image Editing Instructions
111
  </h1>
112
  <p>For faster inference without waiting in queue, you may duplicate the space and upgrade to GPU in settings.
113
  <br/>
114
  <a href="https://huggingface.co/spaces/timbrooks/instruct-pix2pix?duplicate=true">
115
  <img style="margin-top: 0em; margin-bottom: 0em" src="https://bit.ly/3gLdBN6" alt="Duplicate Space"></a>
116
- <p/>""")
 
117
  with gr.Row():
118
  with gr.Column(scale=1, min_width=100):
119
  generate_button = gr.Button("Generate")
 
5
 
6
  import gradio as gr
7
  import torch
 
8
  from diffusers import StableDiffusionInstructPix2PixPipeline
9
+ from PIL import Image, ImageOps
10
 
11
  help_text = """
12
  If you're not getting what you want, there may be a few reasons:
 
45
 
46
  model_id = "timbrooks/instruct-pix2pix"
47
 
48
+
49
  def main():
50
+ pipe = StableDiffusionInstructPix2PixPipeline.from_pretrained(
51
+ model_id, torch_dtype=torch.float16, safety_checker=None
52
+ ).to("cuda")
53
  example_image = Image.open("imgs/example.jpg").convert("RGB")
54
 
55
  def load_example(
 
98
 
99
  generator = torch.manual_seed(seed)
100
  edited_image = pipe(
101
+ instruction,
102
+ image=input_image,
103
+ guidance_scale=text_cfg_scale,
104
+ image_guidance_scale=image_cfg_scale,
105
+ num_inference_steps=steps,
106
+ generator=generator,
107
  ).images[0]
108
  return [seed, text_cfg_scale, image_cfg_scale, edited_image]
109
 
 
111
  return [0, "Randomize Seed", 1371, "Fix CFG", 7.5, 1.5, None]
112
 
113
  with gr.Blocks() as demo:
114
+ gr.HTML(
115
+ """<h1 style="font-weight: 900; margin-bottom: 7px;">
116
  InstructPix2Pix: Learning to Follow Image Editing Instructions
117
  </h1>
118
  <p>For faster inference without waiting in queue, you may duplicate the space and upgrade to GPU in settings.
119
  <br/>
120
  <a href="https://huggingface.co/spaces/timbrooks/instruct-pix2pix?duplicate=true">
121
  <img style="margin-top: 0em; margin-bottom: 0em" src="https://bit.ly/3gLdBN6" alt="Duplicate Space"></a>
122
+ <p/>"""
123
+ )
124
  with gr.Row():
125
  with gr.Column(scale=1, min_width=100):
126
  generate_button = gr.Button("Generate")
requirements.txt CHANGED
@@ -1,6 +1,6 @@
1
  -f --extra-index-url https://download.pytorch.org/whl/cu116
 
 
2
  torch
3
  torchvision
4
- numpy
5
  transformers
6
- git+https://github.com/huggingface/diffusers
 
1
  -f --extra-index-url https://download.pytorch.org/whl/cu116
2
+ git+https://github.com/huggingface/diffusers
3
+ numpy
4
  torch
5
  torchvision
 
6
  transformers