Related papers: Look here! A parametric learning based approach to redirect visual attention

Look here! A parametric learning based approach to redirect visual attention

URL: http://arxiv.org/abs/2008.05413v1
Date: Wed, 12 Aug 2020 16:08:36 GMT
Title: Look here! A parametric learning based approach to redirect visual attention
Authors: Youssef Alami Mejjati and Celso F. Gomez and Kwang In Kim and Eli Shechtman and Zoya Bylinskii
Abstract summary: We introduce an automatic method to make an image region more attention-capturing via subtle image edits. Our model predicts a distinct set of global parametric transformations to be applied to the foreground and background image regions. Our edits enable inference at interactive rates on any image size, and easily generalize to videos.
Score: 49.609412873346386
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Across photography, marketing, and website design, being able to direct the viewer's attention is a powerful tool. Motivated by professional workflows, we introduce an automatic method to make an image region more attention-capturing via subtle image edits that maintain realism and fidelity to the original. From an input image and a user-provided mask, our GazeShiftNet model predicts a distinct set of global parametric transformations to be applied to the foreground and background image regions separately. We present the results of quantitative and qualitative experiments that demonstrate improvements over prior state-of-the-art. In contrast to existing attention shifting algorithms, our global parametric approach better preserves image semantics and avoids typical generative artifacts. Our edits enable inference at interactive rates on any image size, and easily generalize to videos. Extensions of our model allow for multi-style edits and the ability to both increase and attenuate attention in an image region. Furthermore, users can customize the edited images by dialing the edits up or down via interpolations in parameter space. This paper presents a practical tool that can simplify future image editing pipelines.

Related papers

PartEdit: Fine-Grained Image Editing using Pre-Trained Diffusion Models [80.98455219375862]
We present the first text-based image editing approach for object parts based on pre-trained diffusion models. Our approach is preferred by users 77-90% of the time in conducted user studies.
arXiv Detail & Related papers (2025-02-06T13:08:43Z)
PIXELS: Progressive Image Xemplar-based Editing with Latent Surgery [10.594261300488546]
We introduce a novel framework for progressive exemplar-driven editing with off-the-shelf diffusion models, dubbed PIXELS. PIXELS provides granular control over edits, allowing adjustments at the pixel or region level. We demonstrate that PIXELS delivers high-quality edits efficiently, leading to a notable improvement in quantitative metrics as well as human evaluation.
arXiv Detail & Related papers (2025-01-16T20:26:30Z)
DiffUHaul: A Training-Free Method for Object Dragging in Images [78.93531472479202]
We propose a training-free method, dubbed DiffUHaul, for the object dragging task. We first apply attention masking in each denoising step to make the generation more disentangled across different objects. In the early denoising steps, we interpolate the attention features between source and target images to smoothly fuse new layouts with the original appearance.
arXiv Detail & Related papers (2024-06-03T17:59:53Z)
Streamlining Image Editing with Layered Diffusion Brushes [8.738398948669609]
Our system renders a single edit on a 512x512 image within 140 ms using a high-end consumer GPU. Our approach demonstrates efficacy across a range of tasks, including object attribute adjustments, error correction, and sequential prompt-based object placement and manipulation.
arXiv Detail & Related papers (2024-05-01T04:30:03Z)
Prompt Tuning Inversion for Text-Driven Image Editing Using Diffusion Models [6.34777393532937]
We propose an accurate and quick inversion technique, Prompt Tuning Inversion, for text-driven image editing. Our proposed editing method consists of a reconstruction stage and an editing stage. Experiments on ImageNet demonstrate the superior editing performance of our method compared to the state-of-the-art baselines.
arXiv Detail & Related papers (2023-05-08T03:34:33Z)
Zero-shot Image-to-Image Translation [57.46189236379433]
We propose pix2pix-zero, an image-to-image translation method that can preserve the original image without manual prompting. We propose cross-attention guidance, which aims to retain the cross-attention maps of the input image throughout the diffusion process. Our method does not need additional training for these edits and can directly use the existing text-to-image diffusion model.
arXiv Detail & Related papers (2023-02-06T18:59:51Z)
End-to-End Visual Editing with a Generatively Pre-Trained Artist [78.5922562526874]
We consider the targeted image editing problem: blending a region in a source image with a driver image that specifies the desired change. We propose a self-supervised approach that simulates edits by augmenting off-the-shelf images in a target domain. We show that different blending effects can be learned by an intuitive control of the augmentation process, with no other changes required to the model architecture.
arXiv Detail & Related papers (2022-05-03T17:59:30Z)
Enjoy Your Editing: Controllable GANs for Image Editing via Latent Space Navigation [136.53288628437355]
Controllable semantic image editing enables a user to change entire image attributes with few clicks. Current approaches often suffer from attribute edits that are entangled, global image identity changes, and diminished photo-realism. We propose quantitative evaluation strategies for measuring controllable editing performance, unlike prior work which primarily focuses on qualitative evaluation.
arXiv Detail & Related papers (2021-02-01T21:38:36Z)
PIE: Portrait Image Embedding for Semantic Control [82.69061225574774]
We present the first approach for embedding real portrait images in the latent space of StyleGAN. We use StyleRig, a pretrained neural network that maps the control space of a 3D morphable face model to the latent space of the GAN. An identity energy preservation term allows spatially coherent edits while maintaining facial integrity.
arXiv Detail & Related papers (2020-09-20T17:53:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.