Complete-to-Partial 4D Distillation for Self-Supervised Point Cloud
Sequence Representation Learning
- URL: http://arxiv.org/abs/2212.05330v2
- Date: Tue, 13 Dec 2022 03:02:10 GMT
- Title: Complete-to-Partial 4D Distillation for Self-Supervised Point Cloud
Sequence Representation Learning
- Authors: Zhuoyang Zhang, Yuhao Dong, Yunze Liu and Li Yi
- Abstract summary: This paper proposes a new 4D self-supervised pre-training method called Complete-to-Partial 4D Distillation.
Our key idea is to formulate 4D self-supervised representation learning as a teacher-student knowledge distillation framework.
Experiments show that this approach significantly outperforms previous pre-training approaches on a wide range of 4D point cloud sequence understanding tasks.
- Score: 14.033085586047799
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent work on 4D point cloud sequences has attracted a lot of attention.
However, obtaining exhaustively labeled 4D datasets is often very expensive and
laborious, so it is especially important to investigate how to utilize raw
unlabeled data. However, most existing self-supervised point cloud
representation learning methods only consider geometry from a static snapshot
omitting the fact that sequential observations of dynamic scenes could reveal
more comprehensive geometric details. And the video representation learning
frameworks mostly model motion as image space flows, let alone being
3D-geometric-aware. To overcome such issues, this paper proposes a new 4D
self-supervised pre-training method called Complete-to-Partial 4D Distillation.
Our key idea is to formulate 4D self-supervised representation learning as a
teacher-student knowledge distillation framework and let the student learn
useful 4D representations with the guidance of the teacher. Experiments show
that this approach significantly outperforms previous pre-training approaches
on a wide range of 4D point cloud sequence understanding tasks including indoor
and outdoor scenarios.
Related papers
- Comp4D: LLM-Guided Compositional 4D Scene Generation [65.5810466788355]
We present Comp4D, a novel framework for Compositional 4D Generation.
Unlike conventional methods that generate a singular 4D representation of the entire scene, Comp4D innovatively constructs each 4D object within the scene separately.
Our method employs a compositional score distillation technique guided by the pre-defined trajectories.
arXiv Detail & Related papers (2024-03-25T17:55:52Z) - 4DGen: Grounded 4D Content Generation with Spatial-temporal Consistency [118.15258850780417]
This work introduces 4DGen, a novel framework for grounded 4D content creation.
We identify static 3D assets and monocular video sequences as key components in constructing the 4D content.
Our pipeline facilitates conditional 4D generation, enabling users to specify geometry (3D assets) and motion (monocular videos)
arXiv Detail & Related papers (2023-12-28T18:53:39Z) - X4D-SceneFormer: Enhanced Scene Understanding on 4D Point Cloud Videos
through Cross-modal Knowledge Transfer [28.719098240737605]
We propose a novel cross-modal knowledge transfer framework, called X4D-SceneFormer.
It enhances 4D-Scene understanding by transferring texture priors from RGB sequences using a Transformer architecture with temporal relationship mining.
Experiments demonstrate the superior performance of our framework on various 4D point cloud video understanding tasks.
arXiv Detail & Related papers (2023-12-12T15:48:12Z) - NSM4D: Neural Scene Model Based Online 4D Point Cloud Sequence
Understanding [20.79861588128133]
We introduce a generic online 4D perception paradigm called NSM4D.
NSM4D serves as a plug-and-play strategy that can be adapted to existing 4D backbones.
We demonstrate significant improvements on various online perception benchmarks in indoor and outdoor settings.
arXiv Detail & Related papers (2023-10-12T13:42:49Z) - Make-It-4D: Synthesizing a Consistent Long-Term Dynamic Scene Video from
a Single Image [59.18564636990079]
We study the problem of synthesizing a long-term dynamic video from only a single image.
Existing methods either hallucinate inconsistent perpetual views or struggle with long camera trajectories.
We present Make-It-4D, a novel method that can generate a consistent long-term dynamic video from a single image.
arXiv Detail & Related papers (2023-08-20T12:53:50Z) - 4D-Net for Learned Multi-Modal Alignment [87.58354992455891]
We present 4D-Net, a 3D object detection approach, which utilizes 3D Point Cloud and RGB sensing information, both in time.
We are able to incorporate the 4D information by performing a novel connection learning across various feature representations and levels of abstraction, as well as by observing geometric constraints.
arXiv Detail & Related papers (2021-09-02T16:35:00Z) - Auto4D: Learning to Label 4D Objects from Sequential Point Clouds [89.30951657004408]
We propose an automatic pipeline that generates accurate object trajectories in 3D space from LiDAR point clouds.
The key idea is to decompose the 4D object label into two parts: the object size in 3D that's fixed through time for rigid objects, and the motion path describing the evolution of the object's pose through time.
Given the cheap but noisy input, our model produces higher quality 4D labels by re-estimating the object size and smoothing the motion path.
arXiv Detail & Related papers (2021-01-17T04:23:05Z) - 3D Registration for Self-Occluded Objects in Context [66.41922513553367]
We introduce the first deep learning framework capable of effectively handling this scenario.
Our method consists of an instance segmentation module followed by a pose estimation one.
It allows us to perform 3D registration in a one-shot manner, without requiring an expensive iterative procedure.
arXiv Detail & Related papers (2020-11-23T08:05:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.