TASeg: Temporal Aggregation Network for LiDAR Semantic Segmentation
- URL: http://arxiv.org/abs/2407.09751v1
- Date: Sat, 13 Jul 2024 03:00:16 GMT
- Title: TASeg: Temporal Aggregation Network for LiDAR Semantic Segmentation
- Authors: Xiaopei Wu, Yuenan Hou, Xiaoshui Huang, Binbin Lin, Tong He, Xinge Zhu, Yuexin Ma, Boxi Wu, Haifeng Liu, Deng Cai, Wanli Ouyang,
- Abstract summary: We propose a Temporal LiDAR Aggregation and Distillation (TLAD) algorithm, which leverages historical priors to assign different aggregation steps for different classes.
To make full use of temporal images, we design a Temporal Image Aggregation and Fusion (TIAF) module, which can greatly expand the camera FOV.
We also develop a Static-Moving Switch Augmentation (SMSA) algorithm, which utilizes sufficient temporal information to enable objects to switch their motion states freely.
- Score: 80.13343299606146
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training deep models for LiDAR semantic segmentation is challenging due to the inherent sparsity of point clouds. Utilizing temporal data is a natural remedy against the sparsity problem as it makes the input signal denser. However, previous multi-frame fusion algorithms fall short in utilizing sufficient temporal information due to the memory constraint, and they also ignore the informative temporal images. To fully exploit rich information hidden in long-term temporal point clouds and images, we present the Temporal Aggregation Network, termed TASeg. Specifically, we propose a Temporal LiDAR Aggregation and Distillation (TLAD) algorithm, which leverages historical priors to assign different aggregation steps for different classes. It can largely reduce memory and time overhead while achieving higher accuracy. Besides, TLAD trains a teacher injected with gt priors to distill the model, further boosting the performance. To make full use of temporal images, we design a Temporal Image Aggregation and Fusion (TIAF) module, which can greatly expand the camera FOV and enhance the present features. Temporal LiDAR points in the camera FOV are used as mediums to transform temporal image features to the present coordinate for temporal multi-modal fusion. Moreover, we develop a Static-Moving Switch Augmentation (SMSA) algorithm, which utilizes sufficient temporal information to enable objects to switch their motion states freely, thus greatly increasing static and moving training samples. Our TASeg ranks 1st on three challenging tracks, i.e., SemanticKITTI single-scan track, multi-scan track and nuScenes LiDAR segmentation track, strongly demonstrating the superiority of our method. Codes are available at https://github.com/LittlePey/TASeg.
Related papers
- Hierarchical Temporal Context Learning for Camera-based Semantic Scene Completion [57.232688209606515]
We present HTCL, a novel Temporal Temporal Context Learning paradigm for improving camera-based semantic scene completion.
Our method ranks $1st$ on the Semantic KITTI benchmark and even surpasses LiDAR-based methods in terms of mIoU.
arXiv Detail & Related papers (2024-07-02T09:11:17Z) - Spatio-Temporal Bi-directional Cross-frame Memory for Distractor Filtering Point Cloud Single Object Tracking [2.487142846438629]
3 single object tracking within LIDAR point is pivotal task in computer vision.
Existing methods, which depend solely on appearance matching via networks or utilize information from successive frames, encounter significant challenges.
We design an innovative cross-frame bi-temporal motion tracker, named STMD-Tracker, to mitigate these challenges.
arXiv Detail & Related papers (2024-03-23T13:15:44Z) - T-MAE: Temporal Masked Autoencoders for Point Cloud Representation Learning [22.002220932086693]
We propose an effective pre-training strategy, namely Temporal Masked Auto-Encoders (T-MAE)
T-MAE takes as input temporally adjacent frames and learns temporal dependency.
Our T-MAE pre-training strategy alleviates its demand for annotated data.
arXiv Detail & Related papers (2023-12-15T21:30:49Z) - PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection [66.94819989912823]
We propose a point-trajectory transformer with long short-term memory for efficient temporal 3D object detection.
We use point clouds of current-frame objects and their historical trajectories as input to minimize the memory bank storage requirement.
We conduct extensive experiments on the large-scale dataset to demonstrate that our approach performs well against state-of-the-art methods.
arXiv Detail & Related papers (2023-12-13T18:59:13Z) - Disentangling Spatial and Temporal Learning for Efficient Image-to-Video
Transfer Learning [59.26623999209235]
We present DiST, which disentangles the learning of spatial and temporal aspects of videos.
The disentangled learning in DiST is highly efficient because it avoids the back-propagation of massive pre-trained parameters.
Extensive experiments on five benchmarks show that DiST delivers better performance than existing state-of-the-art methods by convincing gaps.
arXiv Detail & Related papers (2023-09-14T17:58:33Z) - SUIT: Learning Significance-guided Information for 3D Temporal Detection [15.237488449422008]
We learn Significance-gUided Information for 3D Temporal detection (SUIT), which simplifies temporal information as sparse features for information fusion across frames.
We evaluate our method on large-scale nuScenes and dataset, where our SUIT not only significantly reduces the memory and cost of temporal fusion, but also performs well over the state-of-the-art baselines.
arXiv Detail & Related papers (2023-07-04T16:22:10Z) - Efficient Spatial-Temporal Information Fusion for LiDAR-Based 3D Moving
Object Segmentation [23.666607237164186]
We propose a novel deep neural network exploiting both spatial-temporal information and different representation modalities of LiDAR scans to improve LiDAR-MOS performance.
Specifically, we first use a range image-based dual-branch structure to separately deal with spatial and temporal information.
We also use a point refinement module via 3D sparse convolution to fuse the information from both LiDAR range image and point cloud representations.
arXiv Detail & Related papers (2022-07-05T17:59:17Z) - LiDARCap: Long-range Marker-less 3D Human Motion Capture with LiDAR
Point Clouds [58.402752909624716]
Existing motion capture datasets are largely short-range and cannot yet fit the need of long-range applications.
We propose LiDARHuman26M, a new human motion capture dataset captured by LiDAR at a much longer range to overcome this limitation.
Our dataset also includes the ground truth human motions acquired by the IMU system and the synchronous RGB images.
arXiv Detail & Related papers (2022-03-28T12:52:45Z) - LiDAR-based Recurrent 3D Semantic Segmentation with Temporal Memory
Alignment [0.0]
We propose a recurrent segmentation architecture (RNN), which takes a single range image frame as input.
An alignment strategy, which we call Temporal Memory Alignment, uses ego motion to temporally align the memory between consecutive frames in feature space.
We demonstrate the benefits of the presented approach on two large-scale datasets and compare it to several stateof-the-art methods.
arXiv Detail & Related papers (2021-03-03T09:01:45Z) - DS-Net: Dynamic Spatiotemporal Network for Video Salient Object
Detection [78.04869214450963]
We propose a novel dynamic temporal-temporal network (DSNet) for more effective fusion of temporal and spatial information.
We show that the proposed method achieves superior performance than state-of-the-art algorithms.
arXiv Detail & Related papers (2020-12-09T06:42:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.