LAPTNet-FPN: Multi-scale LiDAR-aided Projective Transform Network for
Real Time Semantic Grid Prediction
- URL: http://arxiv.org/abs/2302.06414v1
- Date: Fri, 10 Feb 2023 12:34:28 GMT
- Title: LAPTNet-FPN: Multi-scale LiDAR-aided Projective Transform Network for
Real Time Semantic Grid Prediction
- Authors: Manuel Alejandro Diaz-Zapata (CHROMA), David Sierra Gonz\'alez
(CHROMA), \"Ozg\"ur Erkent (CHROMA), Jilles Dibangoye (CHROMA), Christian
Laugier (CHROMA, E-MOTION, Inria)
- Abstract summary: By fusing information from multiple sensors, robustness can be increased and the computational load for the task can be lowered.
Our multi-scale LiDAR-Aided Perspective Transform network uses information available in point clouds to guide the projection of image features to a top-view representation.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semantic grids can be useful representations of the scene around an
autonomous system. By having information about the layout of the space around
itself, a robot can leverage this type of representation for crucial tasks such
as navigation or tracking. By fusing information from multiple sensors,
robustness can be increased and the computational load for the task can be
lowered, achieving real time performance. Our multi-scale LiDAR-Aided
Perspective Transform network uses information available in point clouds to
guide the projection of image features to a top-view representation, resulting
in a relative improvement in the state of the art for semantic grid generation
for human (+8.67%) and movable object (+49.07%) classes in the nuScenes
dataset, as well as achieving results close to the state of the art for the
vehicle, drivable area and walkway classes, while performing inference at 25
FPS.
Related papers
- VDNA-PR: Using General Dataset Representations for Robust Sequential Visual Place Recognition [17.393105901701098]
This paper adapts a general dataset representation technique to produce robust Visual Place Recognition (VPR) descriptors.
Our experiments show that our representation can allow for better robustness than current solutions to serious domain shifts away from the training data distribution.
arXiv Detail & Related papers (2024-03-14T01:30:28Z) - Rethinking Transformers Pre-training for Multi-Spectral Satellite
Imagery [78.43828998065071]
Recent advances in unsupervised learning have demonstrated the ability of large vision models to achieve promising results on downstream tasks.
Such pre-training techniques have also been explored recently in the remote sensing domain due to the availability of large amount of unlabelled data.
In this paper, we re-visit transformers pre-training and leverage multi-scale information that is effectively utilized with multiple modalities.
arXiv Detail & Related papers (2024-03-08T16:18:04Z) - Temporal Embeddings: Scalable Self-Supervised Temporal Representation
Learning from Spatiotemporal Data for Multimodal Computer Vision [1.4127889233510498]
A novel approach is proposed to stratify landscape based on mobility activity time series.
The pixel-wise embeddings are converted to image-like channels that can be used for task-based, multimodal modeling.
arXiv Detail & Related papers (2023-10-16T02:53:29Z) - LiDAR-BEVMTN: Real-Time LiDAR Bird's-Eye View Multi-Task Perception Network for Autonomous Driving [12.713417063678335]
We present a real-time multi-task convolutional neural network for LiDAR-based object detection, semantics, and motion segmentation.
We propose a novel Semantic Weighting and Guidance (SWAG) module to transfer semantic features for improved object detection selectively.
We achieve state-of-the-art results for two tasks, semantic and motion segmentation, and close to state-of-the-art performance for 3D object detection.
arXiv Detail & Related papers (2023-07-17T21:22:17Z) - LAPTNet: LiDAR-Aided Perspective Transform Network [0.0]
We present an architecture that fuses LiDAR and camera information to generate semantic grids.
LAPTNet is able to associate features in the camera plane to the bird's eye view without having to predict any depth information about the scene.
arXiv Detail & Related papers (2022-11-14T18:56:02Z) - Dynamic Spatial Sparsification for Efficient Vision Transformers and
Convolutional Neural Networks [88.77951448313486]
We present a new approach for model acceleration by exploiting spatial sparsity in visual data.
We propose a dynamic token sparsification framework to prune redundant tokens.
We extend our method to hierarchical models including CNNs and hierarchical vision Transformers.
arXiv Detail & Related papers (2022-07-04T17:00:51Z) - PnP-DETR: Towards Efficient Visual Analysis with Transformers [146.55679348493587]
Recently, DETR pioneered the solution vision tasks with transformers, it directly translates the image feature map into the object result.
Recent transformer-based image recognition model andTT show consistent efficiency gain.
arXiv Detail & Related papers (2021-09-15T01:10:30Z) - PC-DAN: Point Cloud based Deep Affinity Network for 3D Multi-Object
Tracking (Accepted as an extended abstract in JRDB-ACT Workshop at CVPR21) [68.12101204123422]
A point cloud is a dense compilation of spatial data in 3D coordinates.
We propose a PointNet-based approach for 3D Multi-Object Tracking (MOT)
arXiv Detail & Related papers (2021-06-03T05:36:39Z) - Learnable Online Graph Representations for 3D Multi-Object Tracking [156.58876381318402]
We propose a unified and learning based approach to the 3D MOT problem.
We employ a Neural Message Passing network for data association that is fully trainable.
We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches.
arXiv Detail & Related papers (2021-04-23T17:59:28Z) - DS-Net: Dynamic Spatiotemporal Network for Video Salient Object
Detection [78.04869214450963]
We propose a novel dynamic temporal-temporal network (DSNet) for more effective fusion of temporal and spatial information.
We show that the proposed method achieves superior performance than state-of-the-art algorithms.
arXiv Detail & Related papers (2020-12-09T06:42:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.