Motion Representations for Articulated Animation
- URL: http://arxiv.org/abs/2104.11280v1
- Date: Thu, 22 Apr 2021 18:53:56 GMT
- Title: Motion Representations for Articulated Animation
- Authors: Aliaksandr Siarohin, Oliver J. Woodford, Jian Ren, Menglei Chai and
Sergey Tulyakov
- Abstract summary: We propose novel motion representations for animating articulated objects consisting of distinct parts.
In a completely unsupervised manner, our method identifies object parts, tracks them in a driving video, and infers their motions by considering their principal axes.
Our model can animate a variety of objects, surpassing previous methods by a large margin on existing benchmarks.
- Score: 34.54825980226596
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose novel motion representations for animating articulated objects
consisting of distinct parts. In a completely unsupervised manner, our method
identifies object parts, tracks them in a driving video, and infers their
motions by considering their principal axes. In contrast to the previous
keypoint-based works, our method extracts meaningful and consistent regions,
describing locations, shape, and pose. The regions correspond to semantically
relevant and distinct object parts, that are more easily detected in frames of
the driving video. To force decoupling of foreground from background, we model
non-object related global motion with an additional affine transformation. To
facilitate animation and prevent the leakage of the shape of the driving
object, we disentangle shape and pose of objects in the region space. Our model
can animate a variety of objects, surpassing previous methods by a large margin
on existing benchmarks. We present a challenging new benchmark with
high-resolution videos and show that the improvement is particularly pronounced
when articulated objects are considered, reaching 96.6% user preference vs. the
state of the art.
Related papers
- Reenact Anything: Semantic Video Motion Transfer Using Motion-Textual Inversion [9.134743677331517]
We propose a pre-trained image-to-video model to disentangle appearance from motion.
Our method, called motion-textual inversion, leverages our observation that image-to-video models extract appearance mainly from the (latent) image input.
By operating on an inflated motion-text embedding containing multiple text/image embedding tokens per frame, we achieve a high temporal motion granularity.
Our approach does not require spatial alignment between the motion reference video and target image, generalizes across various domains, and can be applied to various tasks.
arXiv Detail & Related papers (2024-08-01T10:55:20Z) - Space-Time Diffusion Features for Zero-Shot Text-Driven Motion Transfer [27.278989809466392]
We present a new method for text-driven motion transfer - synthesizing a video that complies with an input text prompt describing the target objects and scene.
We leverage a pre-trained and fixed text-to-video diffusion model, which provides us with generative and motion priors.
arXiv Detail & Related papers (2023-11-28T18:03:27Z) - InstMove: Instance Motion for Object-centric Video Segmentation [70.16915119724757]
In this work, we study the instance-level motion and present InstMove, which stands for Instance Motion for Object-centric Video.
In comparison to pixel-wise motion, InstMove mainly relies on instance-level motion information that is free from image feature embeddings.
With only a few lines of code, InstMove can be integrated into current SOTA methods for three different video segmentation tasks.
arXiv Detail & Related papers (2023-03-14T17:58:44Z) - MovingParts: Motion-based 3D Part Discovery in Dynamic Radiance Field [42.236015785792965]
We present MovingParts, a NeRF-based method for dynamic scene reconstruction and part discovery.
Under the Lagrangian view, we parameterize the scene motion by tracking the trajectory of particles on objects.
The Lagrangian view makes it convenient to discover parts by factorizing the scene motion as a composition of part-level rigid motions.
arXiv Detail & Related papers (2023-03-10T05:06:30Z) - Unsupervised Multi-object Segmentation by Predicting Probable Motion
Patterns [92.80981308407098]
We propose a new approach to learn to segment multiple image objects without manual supervision.
The method can extract objects form still images, but uses videos for supervision.
We show state-of-the-art unsupervised object segmentation performance on simulated and real-world benchmarks.
arXiv Detail & Related papers (2022-10-21T17:57:05Z) - The Emergence of Objectness: Learning Zero-Shot Segmentation from Videos [59.12750806239545]
We show that a video has different views of the same scene related by moving components, and the right region segmentation and region flow would allow mutual view synthesis.
Our model starts with two separate pathways: an appearance pathway that outputs feature-based region segmentation for a single image, and a motion pathway that outputs motion features for a pair of images.
By training the model to minimize view synthesis errors based on segment flow, our appearance and motion pathways learn region segmentation and flow estimation automatically without building them up from low-level edges or optical flows respectively.
arXiv Detail & Related papers (2021-11-11T18:59:11Z) - NeuralDiff: Segmenting 3D objects that move in egocentric videos [92.95176458079047]
We study the problem of decomposing the observed 3D scene into a static background and a dynamic foreground.
This task is reminiscent of the classic background subtraction problem, but is significantly harder because all parts of the scene, static and dynamic, generate a large apparent motion.
In particular, we consider egocentric videos and further separate the dynamic component into objects and the actor that observes and moves them.
arXiv Detail & Related papers (2021-10-19T12:51:35Z) - DyStaB: Unsupervised Object Segmentation via Dynamic-Static
Bootstrapping [72.84991726271024]
We describe an unsupervised method to detect and segment portions of images of live scenes that are seen moving as a coherent whole.
Our method first partitions the motion field by minimizing the mutual information between segments.
It uses the segments to learn object models that can be used for detection in a static image.
arXiv Detail & Related papers (2020-08-16T22:05:13Z) - First Order Motion Model for Image Animation [90.712718329677]
Image animation consists of generating a video sequence so that an object in a source image is animated according to the motion of a driving video.
Our framework addresses this problem without using any annotation or prior information about the specific object to animate.
arXiv Detail & Related papers (2020-02-29T07:08:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.