ProMotion: Prototypes As Motion Learners
- URL: http://arxiv.org/abs/2406.04999v1
- Date: Fri, 7 Jun 2024 15:10:33 GMT
- Title: ProMotion: Prototypes As Motion Learners
- Authors: Yawen Lu, Dongfang Liu, Qifan Wang, Cheng Han, Yiming Cui, Zhiwen Cao, Xueling Zhang, Yingjie Victor Chen, Heng Fan,
- Abstract summary: We introduce ProMotion, a unified prototypical framework engineered to model fundamental motion tasks.
ProMotion offers a range of compelling attributes that set it apart from current task-specific paradigms.
We capitalize on a dual mechanism involving the feature denoiser and the prototypical learner to decipher the intricacies of motion.
- Score: 46.08051377180652
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In this work, we introduce ProMotion, a unified prototypical framework engineered to model fundamental motion tasks. ProMotion offers a range of compelling attributes that set it apart from current task-specific paradigms. We adopt a prototypical perspective, establishing a unified paradigm that harmonizes disparate motion learning approaches. This novel paradigm streamlines the architectural design, enabling the simultaneous assimilation of diverse motion information. We capitalize on a dual mechanism involving the feature denoiser and the prototypical learner to decipher the intricacies of motion. This approach effectively circumvents the pitfalls of ambiguity in pixel-wise feature matching, significantly bolstering the robustness of motion representation. We demonstrate a profound degree of transferability across distinct motion patterns. This inherent versatility reverberates robustly across a comprehensive spectrum of both 2D and 3D downstream tasks. Empirical results demonstrate that ProMotion outperforms various well-known specialized architectures, achieving 0.54 and 0.054 Abs Rel error on the Sintel and KITTI depth datasets, 1.04 and 2.01 average endpoint error on the clean and final pass of Sintel flow benchmark, and 4.30 F1-all error on the KITTI flow benchmark. For its efficacy, we hope our work can catalyze a paradigm shift in universal models in computer vision.
Related papers
- MV-MOS: Multi-View Feature Fusion for 3D Moving Object Segmentation [4.386035726986601]
How to effectively utilize motion and semantic features and avoid information loss during 3D-to-2D projection is still a key challenge.
We propose a novel multi-view MOS model (MV-MOS) by fusing motion-semantic features from different 2D representations of point clouds.
We validated the effectiveness of the proposed multi-branch fusion MOS framework via comprehensive experiments.
arXiv Detail & Related papers (2024-08-20T07:30:00Z) - Prototypical Transformer as Unified Motion Learners [38.31482767855841]
Prototypical Transformer (ProtoFormer) is a framework that approaches various motion tasks from a prototype perspective.
ProtoFormer seamlessly integrates prototype learning with Transformer by thoughtfully considering motion dynamics.
arXiv Detail & Related papers (2024-06-03T17:41:28Z) - Large Motion Model for Unified Multi-Modal Motion Generation [50.56268006354396]
Large Motion Model (LMM) is a motion-centric, multi-modal framework that unifies mainstream motion generation tasks into a generalist model.
LMM tackles these challenges from three principled aspects.
arXiv Detail & Related papers (2024-04-01T17:55:11Z) - MoDi: Unconditional Motion Synthesis from Diverse Data [51.676055380546494]
We present MoDi, an unconditional generative model that synthesizes diverse motions.
Our model is trained in a completely unsupervised setting from a diverse, unstructured and unlabeled motion dataset.
We show that despite the lack of any structure in the dataset, the latent space can be semantically clustered.
arXiv Detail & Related papers (2022-06-16T09:06:25Z) - Joint Spatial-Temporal and Appearance Modeling with Transformer for
Multiple Object Tracking [59.79252390626194]
We propose a novel solution named TransSTAM, which leverages Transformer to model both the appearance features of each object and the spatial-temporal relationships among objects.
The proposed method is evaluated on multiple public benchmarks including MOT16, MOT17, and MOT20, and it achieves a clear performance improvement in both IDF1 and HOTA.
arXiv Detail & Related papers (2022-05-31T01:19:18Z) - Unsupervised Motion Representation Learning with Capsule Autoencoders [54.81628825371412]
Motion Capsule Autoencoder (MCAE) models motion in a two-level hierarchy.
MCAE is evaluated on a novel Trajectory20 motion dataset and various real-world skeleton-based human action datasets.
arXiv Detail & Related papers (2021-10-01T16:52:03Z) - Motion-Attentive Transition for Zero-Shot Video Object Segmentation [99.44383412488703]
We present a Motion-Attentive Transition Network (MATNet) for zero-shot object segmentation.
An asymmetric attention block, called Motion-Attentive Transition (MAT), is designed within a two-stream encoder.
In this way, the encoder becomes deeply internative, allowing for closely hierarchical interactions between object motion and appearance.
arXiv Detail & Related papers (2020-03-09T16:58:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.