Related papers: TexLiDAR: Automated Text Understanding for Panoramic LiDAR Data

TexLiDAR: Automated Text Understanding for Panoramic LiDAR Data

URL: http://arxiv.org/abs/2502.04385v1
Date: Wed, 05 Feb 2025 19:41:06 GMT
Title: TexLiDAR: Automated Text Understanding for Panoramic LiDAR Data
Authors: Naor Cohen, Roy Orfaig, Ben-Zion Bobrovsky,
Abstract summary: Efforts to connect LiDAR data with text, such as LidarCLIP, have primarily focused on embedding 3D point clouds into CLIP text-image space.<n>We propose an alternative approach to connect LiDAR data with text by leveraging 2D imagery generated by the OS1 sensor instead of 3D point clouds.
Score: 0.6144680854063939
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Efforts to connect LiDAR data with text, such as LidarCLIP, have primarily focused on embedding 3D point clouds into CLIP text-image space. However, these approaches rely on 3D point clouds, which present challenges in encoding efficiency and neural network processing. With the advent of advanced LiDAR sensors like Ouster OS1, which, in addition to 3D point clouds, produce fixed resolution depth, signal, and ambient panoramic 2D images, new opportunities emerge for LiDAR based tasks. In this work, we propose an alternative approach to connect LiDAR data with text by leveraging 2D imagery generated by the OS1 sensor instead of 3D point clouds. Using the Florence 2 large model in a zero-shot setting, we perform image captioning and object detection. Our experiments demonstrate that Florence 2 generates more informative captions and achieves superior performance in object detection tasks compared to existing methods like CLIP. By combining advanced LiDAR sensor data with a large pre-trained model, our approach provides a robust and accurate solution for challenging detection scenarios, including real-time applications requiring high accuracy and robustness.

Related papers

PF3Det: A Prompted Foundation Feature Assisted Visual LiDAR 3D Detector [15.8414696386661]
We propose Prompted Foundational 3D Detector (PF3Det), which integrates foundation model encoders and soft prompts to enhance LiDAR-camera feature fusion. PF3Det achieves the state-of-the-art results under limited training data, improving NDS by 1.19% and mAP by 2.42% on the nuScenes dataset.
arXiv Detail & Related papers (2025-04-04T16:11:25Z)
LiOn-XA: Unsupervised Domain Adaptation via LiDAR-Only Cross-Modal Adversarial Training [61.26381389532653]
LiOn-XA is an unsupervised domain adaptation (UDA) approach that combines LiDAR-Only Cross-Modal (X) learning with Adversarial training for 3D LiDAR point cloud semantic segmentation. Our experiments on 3 real-to-real adaptation scenarios demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-10-21T09:50:17Z)
Sparse-to-Dense LiDAR Point Generation by LiDAR-Camera Fusion for 3D Object Detection [9.076003184833557]
We propose the LiDAR-Camera Augmentation Network (LCANet), a novel framework that reconstructs LiDAR point cloud data by fusing 2D image features. LCANet fuses data from LiDAR sensors by projecting image features into the 3D space, integrating semantic information into the point cloud data. This fusion effectively compensates for LiDAR's weakness in detecting objects at long distances, which are often represented by sparse points.
arXiv Detail & Related papers (2024-09-23T13:03:31Z)
4D Contrastive Superflows are Dense 3D Representation Learners [62.433137130087445]
We introduce SuperFlow, a novel framework designed to harness consecutive LiDAR-camera pairs for establishing pretraining objectives. To further boost learning efficiency, we incorporate a plug-and-play view consistency module that enhances alignment of the knowledge distilled from camera views.
arXiv Detail & Related papers (2024-07-08T17:59:54Z)
Semantics-aware LiDAR-Only Pseudo Point Cloud Generation for 3D Object Detection [0.7234862895932991]
Recent advances introduced pseudo-LiDAR, i.e., synthetic dense point clouds, using additional modalities such as cameras to enhance 3D object detection. We present a novel LiDAR-only framework that augments raw scans with dense pseudo point clouds by relying on LiDAR sensors and scene semantics.
arXiv Detail & Related papers (2023-09-16T09:18:47Z)
Point2Pix: Photo-Realistic Point Cloud Rendering via Neural Radiance Fields [63.21420081888606]
Recent Radiance Fields and extensions are proposed to synthesize realistic images from 2D input. We present Point2Pix as a novel point to link the 3D sparse point clouds with 2D dense image pixels.
arXiv Detail & Related papers (2023-03-29T06:26:55Z)
Gait Recognition in Large-scale Free Environment via Single LiDAR [35.684257181154905]
LiDAR's ability to capture depth makes it pivotal for robotic perception and holds promise for real-world gait recognition. We present the Hierarchical Multi-representation Feature Interaction Network (HMRNet) for robust gait recognition. To facilitate LiDAR-based gait recognition research, we introduce FreeGait, a comprehensive gait dataset from large-scale, unconstrained settings.
arXiv Detail & Related papers (2022-11-22T16:05:58Z)
ImLiDAR: Cross-Sensor Dynamic Message Propagation Network for 3D Object Detection [20.44294678711783]
We propose ImLiDAR, a new 3OD paradigm to narrow the cross-sensor discrepancies by progressively fusing the multi-scale features of camera Images and LiDAR point clouds. First, we propose a cross-sensor dynamic message propagation module to combine the best of the multi-scale image and point features. Second, we raise a direct set prediction problem that allows designing an effective set-based detector.
arXiv Detail & Related papers (2022-11-17T13:31:23Z)
LinK3D: Linear Keypoints Representation for 3D LiDAR Point Cloud [18.942933892804028]
We propose a novel 3D feature representation method: Linear Keypoints representation for 3D LiDAR point cloud, called LinK3D. LinK3D shows excellent real-time performance, faster than the sensor frame rate at 10 Hz of a typical rotating LiDAR sensor. Our method can be extended to LiDAR odometry task, and shows good scalability.
arXiv Detail & Related papers (2022-06-13T06:50:56Z)
TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers [49.689566246504356]
We propose TransFusion, a robust solution to LiDAR-camera fusion with a soft-association mechanism to handle inferior image conditions. TransFusion achieves state-of-the-art performance on large-scale datasets. We extend the proposed method to the 3D tracking task and achieve the 1st place in the leaderboard of nuScenes tracking.
arXiv Detail & Related papers (2022-03-22T07:15:13Z)
Learning to Drop Points for LiDAR Scan Synthesis [5.132259673802809]
Generative modeling of 3D scenes is a crucial topic for aiding mobile robots to improve unreliable observations. Most existing studies on point clouds have focused on small and uniform-density data. 3D LiDAR point clouds widely used in mobile robots are non-trivial to be handled because of the large number of points and varying-density. This paper proposes a novel framework based on generative adversarial networks to synthesize realistic LiDAR data as an improved 2D representation.
arXiv Detail & Related papers (2021-02-23T21:53:14Z)
Cross-Modality 3D Object Detection [63.29935886648709]
We present a novel two-stage multi-modal fusion network for 3D object detection. The whole architecture facilitates two-stage fusion. Our experiments on the KITTI dataset show that the proposed multi-stage fusion helps the network to learn better representations.
arXiv Detail & Related papers (2020-08-16T11:01:20Z)
End-to-End Pseudo-LiDAR for Image-Based 3D Object Detection [62.34374949726333]
Pseudo-LiDAR (PL) has led to a drastic reduction in the accuracy gap between methods based on LiDAR sensors and those based on cheap stereo cameras. PL combines state-of-the-art deep neural networks for 3D depth estimation with those for 3D object detection by converting 2D depth map outputs to 3D point cloud inputs. We introduce a new framework based on differentiable Change of Representation (CoR) modules that allow the entire PL pipeline to be trained end-to-end.
arXiv Detail & Related papers (2020-04-07T02:18:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.