DisPositioNet: Disentangled Pose and Identity in Semantic Image
Manipulation
- URL: http://arxiv.org/abs/2211.05499v1
- Date: Thu, 10 Nov 2022 11:47:37 GMT
- Title: DisPositioNet: Disentangled Pose and Identity in Semantic Image
Manipulation
- Authors: Azade Farshad, Yousef Yeganeh, Helisa Dhamo, Federico Tombari, Nassir
Navab
- Abstract summary: DisPositioNet is a model that learns a disentangled representation for each object for the task of image manipulation using scene graphs.
Our framework enables the disentanglement of the variational latent embeddings as well as the feature representation in the graph.
- Score: 83.51882381294357
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Graph representation of objects and their relations in a scene, known as a
scene graph, provides a precise and discernible interface to manipulate a scene
by modifying the nodes or the edges in the graph. Although existing works have
shown promising results in modifying the placement and pose of objects, scene
manipulation often leads to losing some visual characteristics like the
appearance or identity of objects. In this work, we propose DisPositioNet, a
model that learns a disentangled representation for each object for the task of
image manipulation using scene graphs in a self-supervised manner. Our
framework enables the disentanglement of the variational latent embeddings as
well as the feature representation in the graph. In addition to producing more
realistic images due to the decomposition of features like pose and identity,
our method takes advantage of the probabilistic sampling in the intermediate
features to generate more diverse images in object replacement or addition
tasks. The results of our experiments show that disentangling the feature
representations in the latent manifold of the model outperforms the previous
works qualitatively and quantitatively on two public benchmarks. Project Page:
https://scenegenie.github.io/DispositioNet/
Related papers
- Joint Generative Modeling of Scene Graphs and Images via Diffusion
Models [37.788957749123725]
We present a novel generative task: joint scene graph - image generation.
We introduce a novel diffusion model, DiffuseSG, that jointly models the adjacency matrix along with heterogeneous node and edge attributes.
With a graph transformer being the denoiser, DiffuseSG successively denoises the scene graph representation in a continuous space and discretizes the final representation to generate the clean scene graph.
arXiv Detail & Related papers (2024-01-02T10:10:29Z) - Structure-Guided Image Completion with Image-level and Object-level Semantic Discriminators [97.12135238534628]
We propose a learning paradigm that consists of semantic discriminators and object-level discriminators for improving the generation of complex semantics and objects.
Specifically, the semantic discriminators leverage pretrained visual features to improve the realism of the generated visual concepts.
Our proposed scheme significantly improves the generation quality and achieves state-of-the-art results on various tasks.
arXiv Detail & Related papers (2022-12-13T01:36:56Z) - Iterative Scene Graph Generation [55.893695946885174]
Scene graph generation involves identifying object entities and their corresponding interaction predicates in a given image (or video)
Existing approaches to scene graph generation assume certain factorization of the joint distribution to make the estimation iteration feasible.
We propose a novel framework that addresses this limitation, as well as introduces dynamic conditioning on the image.
arXiv Detail & Related papers (2022-07-27T10:37:29Z) - Complex Scene Image Editing by Scene Graph Comprehension [17.72638225034884]
We propose a two-stage method for achieving complex scene image editing by Scene Graph (SGC-Net)
In the first stage, we train a Region of Interest (RoI) prediction network that uses scene graphs and predict the locations of the target objects.
The second stage uses a conditional diffusion model to edit the image based on our RoI predictions.
arXiv Detail & Related papers (2022-03-24T05:12:54Z) - Scene Graph Generation for Better Image Captioning? [48.411957217304]
We propose a model that leverages detected objects and auto-generated visual relationships to describe images in natural language.
We generate a scene graph from raw image pixels by identifying individual objects and visual relationships between them.
This scene graph then serves as input to our graph-to-text model, which generates the final caption.
arXiv Detail & Related papers (2021-09-23T14:35:11Z) - Learning to Generate Scene Graph from Natural Language Supervision [52.18175340725455]
We propose one of the first methods that learn from image-sentence pairs to extract a graphical representation of localized objects and their relationships within an image, known as scene graph.
We leverage an off-the-shelf object detector to identify and localize object instances, match labels of detected regions to concepts parsed from captions, and thus create "pseudo" labels for learning scene graph.
arXiv Detail & Related papers (2021-09-06T03:38:52Z) - Unconditional Scene Graph Generation [72.53624470737712]
We develop a deep auto-regressive model called SceneGraphGen which can learn the probability distribution over labelled and directed graphs.
We show that the scene graphs generated by SceneGraphGen are diverse and follow the semantic patterns of real-world scenes.
arXiv Detail & Related papers (2021-08-12T17:57:16Z) - Scene Graph Generation via Conditional Random Fields [14.282277071380447]
We propose a novel scene graph generation model for predicting object instances and its corresponding relationships in an image.
Our model, SG-CRF, learns the sequential order of subject and object in a relationship triplet, and the semantic compatibility of object nodes instance and relationship nodes in a scene graph efficiently.
arXiv Detail & Related papers (2018-11-20T04:55:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.