PRISM: Progressive Restoration for Scene Graph-based Image Manipulation
- URL: http://arxiv.org/abs/2311.02247v1
- Date: Fri, 3 Nov 2023 21:30:34 GMT
- Title: PRISM: Progressive Restoration for Scene Graph-based Image Manipulation
- Authors: Pavel Jahoda, Azade Farshad, Yousef Yeganeh, Ehsan Adeli, Nassir Navab
- Abstract summary: PRISM is a novel multi-head image manipulation approach to improve the accuracy and quality of the manipulated regions in the scene.
Our results demonstrate the potential of our approach for enhancing the quality and precision of scene graph-based image manipulation.
- Score: 47.77003316561398
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Scene graphs have emerged as accurate descriptive priors for image generation
and manipulation tasks, however, their complexity and diversity of the shapes
and relations of objects in data make it challenging to incorporate them into
the models and generate high-quality results. To address these challenges, we
propose PRISM, a novel progressive multi-head image manipulation approach to
improve the accuracy and quality of the manipulated regions in the scene. Our
image manipulation framework is trained using an end-to-end denoising masked
reconstruction proxy task, where the masked regions are progressively unmasked
from the outer regions to the inner part. We take advantage of the outer part
of the masked area as they have a direct correlation with the context of the
scene. Moreover, our multi-head architecture simultaneously generates detailed
object-specific regions in addition to the entire image to produce
higher-quality images. Our model outperforms the state-of-the-art methods in
the semantic image manipulation task on the CLEVR and Visual Genome datasets.
Our results demonstrate the potential of our approach for enhancing the quality
and precision of scene graph-based image manipulation.
Related papers
- Scene Graph Disentanglement and Composition for Generalizable Complex Image Generation [44.457347230146404]
We leverage the scene graph, a powerful structured representation, for complex image generation.
We employ the generative capabilities of variational autoencoders and diffusion models in a generalizable manner.
Our method outperforms recent competitors based on text, layout, or scene graph.
arXiv Detail & Related papers (2024-10-01T07:02:46Z) - Sketch-guided Image Inpainting with Partial Discrete Diffusion Process [5.005162730122933]
We introduce a novel partial discrete diffusion process (PDDP) for sketch-guided inpainting.
PDDP corrupts the masked regions of the image and reconstructs these masked regions conditioned on hand-drawn sketches.
The proposed novel transformer module accepts two inputs -- the image containing the masked region to be inpainted and the query sketch to model the reverse diffusion process.
arXiv Detail & Related papers (2024-04-18T07:07:38Z) - BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed
Dual-Branch Diffusion [61.90969199199739]
BrushNet is a novel plug-and-play dual-branch model engineered to embed pixel-level masked image features into any pre-trained DM.
BrushNet's superior performance over existing models across seven key metrics, including image quality, mask region preservation, and textual coherence.
arXiv Detail & Related papers (2024-03-11T17:59:31Z) - Pixel-Inconsistency Modeling for Image Manipulation Localization [59.968362815126326]
Digital image forensics plays a crucial role in image authentication and manipulation localization.
This paper presents a generalized and robust manipulation localization model through the analysis of pixel inconsistency artifacts.
Experiments show that our method successfully extracts inherent pixel-inconsistency forgery fingerprints.
arXiv Detail & Related papers (2023-09-30T02:54:51Z) - High-Quality Entity Segmentation [110.55724145851725]
CropFormer is designed to tackle the intractability of instance-level segmentation on high-resolution images.
It improves mask prediction by fusing high-res image crops that provide more fine-grained image details and the full image.
With CropFormer, we achieve a significant AP gain of $1.9$ on the challenging entity segmentation task.
arXiv Detail & Related papers (2022-11-10T18:58:22Z) - DisPositioNet: Disentangled Pose and Identity in Semantic Image
Manipulation [83.51882381294357]
DisPositioNet is a model that learns a disentangled representation for each object for the task of image manipulation using scene graphs.
Our framework enables the disentanglement of the variational latent embeddings as well as the feature representation in the graph.
arXiv Detail & Related papers (2022-11-10T11:47:37Z) - Joint Learning of Deep Texture and High-Frequency Features for
Computer-Generated Image Detection [24.098604827919203]
We propose a joint learning strategy with deep texture and high-frequency features for CG image detection.
A semantic segmentation map is generated to guide the affine transformation operation.
The combination of the original image and the high-frequency components of the original and rendered images are fed into a multi-branch neural network equipped with attention mechanisms.
arXiv Detail & Related papers (2022-09-07T17:30:40Z) - Bridging Composite and Real: Towards End-to-end Deep Image Matting [88.79857806542006]
We study the roles of semantics and details for image matting.
We propose a novel Glance and Focus Matting network (GFM), which employs a shared encoder and two separate decoders.
Comprehensive empirical studies have demonstrated that GFM outperforms state-of-the-art methods.
arXiv Detail & Related papers (2020-10-30T10:57:13Z) - Domain Adaptation with Morphologic Segmentation [8.0698976170854]
We present a novel domain adaptation framework that uses morphologic segmentation to translate images from arbitrary input domains (real and synthetic) into a uniform output domain.
Our goal is to establish a preprocessing step that unifies data from multiple sources into a common representation.
We showcase the effectiveness of our approach by qualitatively and quantitatively evaluating our method on four data sets of simulated and real data of urban scenes.
arXiv Detail & Related papers (2020-06-16T17:06:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.