DragAPart: Learning a Part-Level Motion Prior for Articulated Objects
- URL: http://arxiv.org/abs/2403.15382v2
- Date: Sun, 28 Jul 2024 19:45:06 GMT
- Title: DragAPart: Learning a Part-Level Motion Prior for Articulated Objects
- Authors: Ruining Li, Chuanxia Zheng, Christian Rupprecht, Andrea Vedaldi,
- Abstract summary: We introduce DragAPart, a method that generates a new image of the same object that responds to the action of the drags.
Compared to prior motion-controlled generators, we demonstrate much better part-level motion understanding.
- Score: 67.97235923372035
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We introduce DragAPart, a method that, given an image and a set of drags as input, generates a new image of the same object that responds to the action of the drags. Differently from prior works that focused on repositioning objects, DragAPart predicts part-level interactions, such as opening and closing a drawer. We study this problem as a proxy for learning a generalist motion model, not restricted to a specific kinematic structure or object category. We start from a pre-trained image generator and fine-tune it on a new synthetic dataset, Drag-a-Move, which we introduce. Combined with a new encoding for the drags and dataset randomization, the model generalizes well to real images and different categories. Compared to prior motion-controlled generators, we demonstrate much better part-level motion understanding.
Related papers
- Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics [67.97235923372035]
We present Puppet-Master, an interactive video generative model that can serve as a motion prior for part-level dynamics.
At test time, given a single image and a sparse set of motion trajectories, Puppet-Master can synthesize a video depicting realistic part-level motion faithful to the given drag interactions.
arXiv Detail & Related papers (2024-08-08T17:59:38Z) - Space-Time Diffusion Features for Zero-Shot Text-Driven Motion Transfer [27.278989809466392]
We present a new method for text-driven motion transfer - synthesizing a video that complies with an input text prompt describing the target objects and scene.
We leverage a pre-trained and fixed text-to-video diffusion model, which provides us with generative and motion priors.
arXiv Detail & Related papers (2023-11-28T18:03:27Z) - ROAM: Robust and Object-Aware Motion Generation Using Neural Pose
Descriptors [73.26004792375556]
This paper shows that robustness and generalisation to novel scene objects in 3D object-aware character synthesis can be achieved by training a motion model with as few as one reference object.
We leverage an implicit feature representation trained on object-only datasets, which encodes an SE(3)-equivariant descriptor field around the object.
We demonstrate substantial improvements in 3D virtual character motion and interaction quality and robustness to scenarios with unseen objects.
arXiv Detail & Related papers (2023-08-24T17:59:51Z) - Affordance Diffusion: Synthesizing Hand-Object Interactions [81.98499943996394]
Given an RGB image of an object, we aim to hallucinate plausible images of a human hand interacting with it.
We propose a two-step generative approach: a LayoutNet that samples an articulation-agnostic hand-object-interaction layout, and a ContentNet that synthesizes images of a hand grasping the object.
arXiv Detail & Related papers (2023-03-21T17:59:10Z) - Is an Object-Centric Video Representation Beneficial for Transfer? [86.40870804449737]
We introduce a new object-centric video recognition model on a transformer architecture.
We show that the object-centric model outperforms prior video representations.
arXiv Detail & Related papers (2022-07-20T17:59:44Z) - Click to Move: Controlling Video Generation with Sparse Motion [30.437648200928603]
Click to Move (C2M) is a novel framework for video generation where the user can control the motion of the synthesized video through mouse clicks.
Our model receives as input an initial frame, its corresponding segmentation map and the sparse motion vectors encoding the input provided by the user.
It outputs a plausible video sequence starting from the given frame and with a motion that is consistent with user input.
arXiv Detail & Related papers (2021-08-19T17:33:13Z) - Motion Representations for Articulated Animation [34.54825980226596]
We propose novel motion representations for animating articulated objects consisting of distinct parts.
In a completely unsupervised manner, our method identifies object parts, tracks them in a driving video, and infers their motions by considering their principal axes.
Our model can animate a variety of objects, surpassing previous methods by a large margin on existing benchmarks.
arXiv Detail & Related papers (2021-04-22T18:53:56Z) - First Order Motion Model for Image Animation [90.712718329677]
Image animation consists of generating a video sequence so that an object in a source image is animated according to the motion of a driving video.
Our framework addresses this problem without using any annotation or prior information about the specific object to animate.
arXiv Detail & Related papers (2020-02-29T07:08:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.