DPMix: Mixture of Depth and Point Cloud Video Experts for 4D Action
Segmentation
- URL: http://arxiv.org/abs/2307.16803v1
- Date: Mon, 31 Jul 2023 16:14:24 GMT
- Title: DPMix: Mixture of Depth and Point Cloud Video Experts for 4D Action
Segmentation
- Authors: Yue Zhang and Hehe Fan and Yi Yang and Mohan Kankanhalli
- Abstract summary: We present our findings from the research conducted on the Human-Object Interaction 4D (HOI4D) dataset for egocentric action segmentation task.
We convert point cloud videos into depth videos and employ traditional video modeling methods to improve 4D action segmentation.
The proposed method achieved the first place in the 4D Action Track of the HOI4D Challenge 2023.
- Score: 39.806610397357986
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In this technical report, we present our findings from the research conducted
on the Human-Object Interaction 4D (HOI4D) dataset for egocentric action
segmentation task. As a relatively novel research area, point cloud video
methods might not be good at temporal modeling, especially for long point cloud
videos (\eg, 150 frames). In contrast, traditional video understanding methods
have been well developed. Their effectiveness on temporal modeling has been
widely verified on many large scale video datasets. Therefore, we convert point
cloud videos into depth videos and employ traditional video modeling methods to
improve 4D action segmentation. By ensembling depth and point cloud video
methods, the accuracy is significantly improved. The proposed method, named
Mixture of Depth and Point cloud video experts (DPMix), achieved the first
place in the 4D Action Segmentation Track of the HOI4D Challenge 2023.
Related papers
- Enhancing Temporal Consistency in Video Editing by Reconstructing Videos with 3D Gaussian Splatting [94.84688557937123]
Video-3DGS is a 3D Gaussian Splatting (3DGS)-based video refiner designed to enhance temporal consistency in zero-shot video editors.
Our approach utilizes a two-stage 3D Gaussian optimizing process tailored for editing dynamic monocular videos.
It enhances video editing by ensuring temporal consistency across 58 dynamic monocular videos.
arXiv Detail & Related papers (2024-06-04T17:57:37Z) - DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos [21.93514516437402]
We present DreamScene4D, the first approach to generate 3D dynamic scenes of multiple objects from monocular videos via novel view synthesis.
Our key insight is a "decompose-recompose" approach that factorizes the video scene into the background and object tracks.
We show extensive results on challenging DAVIS, Kubric, and self-captured videos with quantitative comparisons and a user preference study.
arXiv Detail & Related papers (2024-05-03T17:55:34Z) - X4D-SceneFormer: Enhanced Scene Understanding on 4D Point Cloud Videos
through Cross-modal Knowledge Transfer [28.719098240737605]
We propose a novel cross-modal knowledge transfer framework, called X4D-SceneFormer.
It enhances 4D-Scene understanding by transferring texture priors from RGB sequences using a Transformer architecture with temporal relationship mining.
Experiments demonstrate the superior performance of our framework on various 4D point cloud video understanding tasks.
arXiv Detail & Related papers (2023-12-12T15:48:12Z) - Make-It-4D: Synthesizing a Consistent Long-Term Dynamic Scene Video from
a Single Image [59.18564636990079]
We study the problem of synthesizing a long-term dynamic video from only a single image.
Existing methods either hallucinate inconsistent perpetual views or struggle with long camera trajectories.
We present Make-It-4D, a novel method that can generate a consistent long-term dynamic video from a single image.
arXiv Detail & Related papers (2023-08-20T12:53:50Z) - Masked Spatio-Temporal Structure Prediction for Self-supervised Learning
on Point Cloud Videos [75.9251839023226]
We propose a Masked-temporal Structure Prediction (MaST-Pre) method to capture the structure of point cloud videos without human annotations.
MaST-Pre consists of two self-supervised learning tasks. First, by reconstructing masked point tubes, our method is able to capture appearance information of point cloud videos.
Second, to learn motion, we propose a temporal cardinality difference prediction task that estimates the change in the number of points within a point tube.
arXiv Detail & Related papers (2023-08-18T02:12:54Z) - NVDS+: Towards Efficient and Versatile Neural Stabilizer for Video Depth Estimation [58.21817572577012]
Video depth estimation aims to infer temporally consistent depth.
We introduce NVDS+ that stabilizes inconsistent depth estimated by various single-image models in a plug-and-play manner.
We also elaborate a large-scale Video Depth in the Wild dataset, which contains 14,203 videos with over two million frames.
arXiv Detail & Related papers (2023-07-17T17:57:01Z) - Complete-to-Partial 4D Distillation for Self-Supervised Point Cloud
Sequence Representation Learning [14.033085586047799]
This paper proposes a new 4D self-supervised pre-training method called Complete-to-Partial 4D Distillation.
Our key idea is to formulate 4D self-supervised representation learning as a teacher-student knowledge distillation framework.
Experiments show that this approach significantly outperforms previous pre-training approaches on a wide range of 4D point cloud sequence understanding tasks.
arXiv Detail & Related papers (2022-12-10T16:26:19Z) - Learning Fine-Grained Motion Embedding for Landscape Animation [140.57889994591494]
We propose a model named FGLA to generate high-quality and realistic videos by learning Fine-Grained motion embedding.
To train and evaluate on diverse time-lapse videos, we build the largest high-resolution Time-lapse video dataset with Diverse scenes.
Our method achieves relative improvements by 19% on LIPIS and 5.6% on FVD compared with state-of-the-art methods on our dataset.
arXiv Detail & Related papers (2021-09-06T02:47:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.