Motion Guidance: Diffusion-Based Image Editing with Differentiable
Motion Estimators
- URL: http://arxiv.org/abs/2401.18085v1
- Date: Wed, 31 Jan 2024 18:59:59 GMT
- Title: Motion Guidance: Diffusion-Based Image Editing with Differentiable
Motion Estimators
- Authors: Daniel Geng, Andrew Owens
- Abstract summary: Motion guidance is a technique that allows a user to specify dense, complex motion fields that indicate where each pixel in an image should move.
We demonstrate that our technique works on complex motions and produces high quality edits of real and generated images.
- Score: 19.853978560075305
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion models are capable of generating impressive images conditioned on
text descriptions, and extensions of these models allow users to edit images at
a relatively coarse scale. However, the ability to precisely edit the layout,
position, pose, and shape of objects in images with diffusion models is still
difficult. To this end, we propose motion guidance, a zero-shot technique that
allows a user to specify dense, complex motion fields that indicate where each
pixel in an image should move. Motion guidance works by steering the diffusion
sampling process with the gradients through an off-the-shelf optical flow
network. Specifically, we design a guidance loss that encourages the sample to
have the desired motion, as estimated by a flow network, while also being
visually similar to the source image. By simultaneously sampling from a
diffusion model and guiding the sample to have low guidance loss, we can obtain
a motion-edited image. We demonstrate that our technique works on complex
motions and produces high quality edits of real and generated images.
Related papers
- MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion [94.66090422753126]
MotionFollower is a lightweight score-guided diffusion model for video motion editing.
It delivers superior motion editing performance and exclusively supports large camera movements and actions.
Compared with MotionEditor, the most advanced motion editing model, MotionFollower achieves an approximately 80% reduction in GPU memory.
arXiv Detail & Related papers (2024-05-30T17:57:30Z) - Editable Image Elements for Controllable Synthesis [79.58148778509769]
We propose an image representation that promotes spatial editing of input images using a diffusion model.
We show the effectiveness of our representation on various image editing tasks, such as object resizing, rearrangement, dragging, de-occlusion, removal, variation, and image composition.
arXiv Detail & Related papers (2024-04-24T17:59:11Z) - Move Anything with Layered Scene Diffusion [77.45870343845492]
We propose SceneDiffusion to optimize a layered scene representation during the diffusion sampling process.
Our key insight is that spatial disentanglement can be obtained by jointly denoising scene renderings at different spatial layouts.
Our generated scenes support a wide range of spatial editing operations, including moving, resizing, cloning, and layer-wise appearance editing operations.
arXiv Detail & Related papers (2024-04-10T17:28:16Z) - Magic Fixup: Streamlining Photo Editing by Watching Dynamic Videos [32.74215702447293]
We propose a generative model that synthesizes a photorealistic output that follows a prescribed layout.
Our method transfers fine details from the original image and preserves the identity of its parts.
We show that by using simple segmentations and coarse 2D manipulations, we can synthesize a photorealistic edit faithful to the user's input.
arXiv Detail & Related papers (2024-03-19T17:59:58Z) - Bidirectionally Deformable Motion Modulation For Video-based Human Pose
Transfer [19.5025303182983]
Video-based human pose transfer is a video-to-video generation task that animates a plain source human image based on a series of target human poses.
We propose a novel Deformable Motion Modulation (DMM) that utilizes geometric kernel offset with adaptive weight modulation to simultaneously perform discontinuous feature alignment and style transfer.
arXiv Detail & Related papers (2023-07-15T09:24:45Z) - DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing [94.24479528298252]
DragGAN is an interactive point-based image editing framework that achieves impressive editing results with pixel-level precision.
By harnessing large-scale pretrained diffusion models, we greatly enhance the applicability of interactive point-based editing on both real and diffusion-generated images.
We present a challenging benchmark dataset called DragBench to evaluate the performance of interactive point-based image editing methods.
arXiv Detail & Related papers (2023-06-26T06:04:09Z) - Optical Flow Estimation from a Single Motion-blurred Image [66.2061278123057]
Motion blur in an image may have practical interests in fundamental computer vision problems.
We propose a novel framework to estimate optical flow from a single motion-blurred image in an end-to-end manner.
arXiv Detail & Related papers (2021-03-04T12:45:18Z) - Event-based Motion Segmentation with Spatio-Temporal Graph Cuts [51.17064599766138]
We have developed a method to identify independently objects acquired with an event-based camera.
The method performs on par or better than the state of the art without having to predetermine the number of expected moving objects.
arXiv Detail & Related papers (2020-12-16T04:06:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.