Related papers: Gaussian See, Gaussian Do: Semantic 3D Motion Transfer from Multiview Video

Gaussian See, Gaussian Do: Semantic 3D Motion Transfer from Multiview Video

URL: http://arxiv.org/abs/2511.14848v1
Date: Tue, 18 Nov 2025 19:02:50 GMT
Title: Gaussian See, Gaussian Do: Semantic 3D Motion Transfer from Multiview Video
Authors: Yarin Bekor, Gal Michael Harari, Or Perel, Or Litany,
Abstract summary: We present a novel approach for semantic 3D motion transfer from multiview video.<n>We extract motion embeddings from source videos via condition inversion, apply them to rendered frames, and use resulting videos to supervise dynamic 3D Gaussian Splatting reconstruction.<n>We establish the first benchmark for semantic 3D motion transfer and demonstrate superior motion fidelity and structural consistency compared to adapted baselines.
Score: 15.994811723477973
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We present Gaussian See, Gaussian Do, a novel approach for semantic 3D motion transfer from multiview video. Our method enables rig-free, cross-category motion transfer between objects with semantically meaningful correspondence. Building on implicit motion transfer techniques, we extract motion embeddings from source videos via condition inversion, apply them to rendered frames of static target shapes, and use the resulting videos to supervise dynamic 3D Gaussian Splatting reconstruction. Our approach introduces an anchor-based view-aware motion embedding mechanism, ensuring cross-view consistency and accelerating convergence, along with a robust 4D reconstruction pipeline that consolidates noisy supervision videos. We establish the first benchmark for semantic 3D motion transfer and demonstrate superior motion fidelity and structural consistency compared to adapted baselines. Code and data for this paper available at https://gsgd-motiontransfer.github.io/

Related papers

Motion Marionette: Rethinking Rigid Motion Transfer via Prior Guidance [26.642143303176997]
Motion Marionette is a framework for rigid motion transfer from monocular source videos to single-view target images.<n> Motion trajectories are extracted from the source video to construct a spatial-temporal (SpaT) prior.<n>The resulting velocity field can be flexibly employed for efficient video production.
arXiv Detail & Related papers (2025-11-25T04:34:42Z)
DIMO: Diverse 3D Motion Generation for Arbitrary Objects [57.14954351767432]
DIMO is a generative approach capable of generating diverse 3D motions for arbitrary objects from a single image.<n>We leverage the rich priors in well-trained video models to extract the common motion patterns.<n>During inference time with learned latent space, we can instantly sample diverse 3D motions in a single-forward pass.
arXiv Detail & Related papers (2025-11-10T18:56:49Z)
In-2-4D: Inbetweening from Two Single-View Images to 4D Generation [63.68181731564576]
We propose a new problem, Inbetween-2-4D, for generative 4D (i.e., 3D + motion) in interpolate two single-view images.<n>In contrast to video/4D generation from only text or a single image, our interpolative task can leverage more precise motion control to better constrain the generation.
arXiv Detail & Related papers (2025-04-11T09:01:09Z)
H3D-DGS: Exploring Heterogeneous 3D Motion Representation for Deformable 3D Gaussian Splatting [39.2960379257236]
Dynamic scene reconstruction poses a persistent challenge in 3D vision.<n>Deformable 3D Gaussian splatting has emerged as an effective method for this task, offering real-time rendering and high visual fidelity.<n>This approach decomposes a dynamic scene into a static representation in a canonical space and time-varying scene motion.<n>Experiments on the Neu3DV and CMU-Panoptic datasets demonstrate that our method achieves superior performance over state-of-the-art deformable 3D Gaussian splatting techniques.
arXiv Detail & Related papers (2024-08-23T12:51:49Z)
Sync4D: Video Guided Controllable Dynamics for Physics-Based 4D Generation [47.203483017875726]
We introduce a novel approach for creating controllable dynamics in 3D-generated Gaussians using casually captured reference videos. Our method transfers the motion of objects from reference videos to a variety of generated 3D Gaussians across different categories. Our technique offers specific and high-quality motion transfer, maintaining both shape integrity and temporal consistency.
arXiv Detail & Related papers (2024-05-27T05:49:12Z)
SC4D: Sparse-Controlled Video-to-4D Generation and Motion Transfer [57.506654943449796]
We propose an efficient, sparse-controlled video-to-4D framework named SC4D that decouples motion and appearance. Our method surpasses existing methods in both quality and efficiency. We devise a novel application that seamlessly transfers motion onto a diverse array of 4D entities.
arXiv Detail & Related papers (2024-04-04T18:05:18Z)
Motion-aware 3D Gaussian Splatting for Efficient Dynamic Scene Reconstruction [89.53963284958037]
We propose a novel motion-aware enhancement framework for dynamic scene reconstruction. Specifically, we first establish a correspondence between 3D Gaussian movements and pixel-level flow. For the prevalent deformation-based paradigm that presents a harder optimization problem, a transient-aware deformation auxiliary module is proposed.
arXiv Detail & Related papers (2024-03-18T03:46:26Z)
MoCaNet: Motion Retargeting in-the-wild via Canonicalization Networks [77.56526918859345]
We present a novel framework that brings the 3D motion task from controlled environments to in-the-wild scenarios. It is capable of body motion from a character in a 2D monocular video to a 3D character without using any motion capture system or 3D reconstruction procedure.
arXiv Detail & Related papers (2021-12-19T07:52:05Z)
Motion-Attentive Transition for Zero-Shot Video Object Segmentation [99.44383412488703]
We present a Motion-Attentive Transition Network (MATNet) for zero-shot object segmentation. An asymmetric attention block, called Motion-Attentive Transition (MAT), is designed within a two-stream encoder. In this way, the encoder becomes deeply internative, allowing for closely hierarchical interactions between object motion and appearance.
arXiv Detail & Related papers (2020-03-09T16:58:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.