rsortino commited on
Commit
d098f09
1 Parent(s): e8d4cbc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -0
README.md CHANGED
@@ -1,3 +1,33 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ datasets:
4
+ - multi-train/coco_captions_1107
5
+ - visual_genome
6
+ language:
7
+ - en
8
+ pipeline_tag: text-to-image
9
+ tags:
10
+ - scene_graph
11
+ - transformers
12
+ - laplacian
13
+ - autoregressive
14
+ - vqvae
15
  ---
16
+
17
+ # trf-sg2im
18
+
19
+ Model card for the paper __"[Transformer-Based Image Generation from Scene Graphs](https://arxiv.org/abs/2303.04634)"__.
20
+ Original GitHub implementation at [](https://github.com/perceivelab/trf-sg2im).
21
+
22
+ ![teaser](docs/teaser.gif)
23
+
24
+ ## Model
25
+
26
+ This model is a two-stage scene-graph-to-image approach. It takes a scene graph as input and generates a layout using a transformer-based architecture with Laplacian Positional Encoding.
27
+ Then, it uses this estimated layout to condition an autoregressive GPT-like transformer to compose the image in the latent, discrete space, converted into the final image by a VQVAE.
28
+
29
+ ![architecture](docs/architecture.png)
30
+
31
+ ## Usage
32
+ For usage instructions, please refer to the original [GitHub repo](https://github.com/perceivelab/trf-sg2im).
33
+