TexLiDAR: Automated Text Understanding for Panoramic LiDAR Data
- URL: http://arxiv.org/abs/2502.04385v1
- Date: Wed, 05 Feb 2025 19:41:06 GMT
- Title: TexLiDAR: Automated Text Understanding for Panoramic LiDAR Data
- Authors: Naor Cohen, Roy Orfaig, Ben-Zion Bobrovsky,
- Abstract summary: Efforts to connect LiDAR data with text, such as LidarCLIP, have primarily focused on embedding 3D point clouds into CLIP text-image space.
We propose an alternative approach to connect LiDAR data with text by leveraging 2D imagery generated by the OS1 sensor instead of 3D point clouds.
- Score: 0.6144680854063939
- License:
- Abstract: Efforts to connect LiDAR data with text, such as LidarCLIP, have primarily focused on embedding 3D point clouds into CLIP text-image space. However, these approaches rely on 3D point clouds, which present challenges in encoding efficiency and neural network processing. With the advent of advanced LiDAR sensors like Ouster OS1, which, in addition to 3D point clouds, produce fixed resolution depth, signal, and ambient panoramic 2D images, new opportunities emerge for LiDAR based tasks. In this work, we propose an alternative approach to connect LiDAR data with text by leveraging 2D imagery generated by the OS1 sensor instead of 3D point clouds. Using the Florence 2 large model in a zero-shot setting, we perform image captioning and object detection. Our experiments demonstrate that Florence 2 generates more informative captions and achieves superior performance in object detection tasks compared to existing methods like CLIP. By combining advanced LiDAR sensor data with a large pre-trained model, our approach provides a robust and accurate solution for challenging detection scenarios, including real-time applications requiring high accuracy and robustness.
Related papers
- LiOn-XA: Unsupervised Domain Adaptation via LiDAR-Only Cross-Modal Adversarial Training [61.26381389532653]
LiOn-XA is an unsupervised domain adaptation (UDA) approach that combines LiDAR-Only Cross-Modal (X) learning with Adversarial training for 3D LiDAR point cloud semantic segmentation.
Our experiments on 3 real-to-real adaptation scenarios demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-10-21T09:50:17Z) - Sparse-to-Dense LiDAR Point Generation by LiDAR-Camera Fusion for 3D Object Detection [9.076003184833557]
We propose the LiDAR-Camera Augmentation Network (LCANet), a novel framework that reconstructs LiDAR point cloud data by fusing 2D image features.
LCANet fuses data from LiDAR sensors by projecting image features into the 3D space, integrating semantic information into the point cloud data.
This fusion effectively compensates for LiDAR's weakness in detecting objects at long distances, which are often represented by sparse points.
arXiv Detail & Related papers (2024-09-23T13:03:31Z) - 4D Contrastive Superflows are Dense 3D Representation Learners [62.433137130087445]
We introduce SuperFlow, a novel framework designed to harness consecutive LiDAR-camera pairs for establishing pretraining objectives.
To further boost learning efficiency, we incorporate a plug-and-play view consistency module that enhances alignment of the knowledge distilled from camera views.
arXiv Detail & Related papers (2024-07-08T17:59:54Z) - Semantics-aware LiDAR-Only Pseudo Point Cloud Generation for 3D Object
Detection [0.7234862895932991]
Recent advances introduced pseudo-LiDAR, i.e., synthetic dense point clouds, using additional modalities such as cameras to enhance 3D object detection.
We present a novel LiDAR-only framework that augments raw scans with dense pseudo point clouds by relying on LiDAR sensors and scene semantics.
arXiv Detail & Related papers (2023-09-16T09:18:47Z) - Point2Pix: Photo-Realistic Point Cloud Rendering via Neural Radiance
Fields [63.21420081888606]
Recent Radiance Fields and extensions are proposed to synthesize realistic images from 2D input.
We present Point2Pix as a novel point to link the 3D sparse point clouds with 2D dense image pixels.
arXiv Detail & Related papers (2023-03-29T06:26:55Z) - ImLiDAR: Cross-Sensor Dynamic Message Propagation Network for 3D Object
Detection [20.44294678711783]
We propose ImLiDAR, a new 3OD paradigm to narrow the cross-sensor discrepancies by progressively fusing the multi-scale features of camera Images and LiDAR point clouds.
First, we propose a cross-sensor dynamic message propagation module to combine the best of the multi-scale image and point features.
Second, we raise a direct set prediction problem that allows designing an effective set-based detector.
arXiv Detail & Related papers (2022-11-17T13:31:23Z) - TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with
Transformers [49.689566246504356]
We propose TransFusion, a robust solution to LiDAR-camera fusion with a soft-association mechanism to handle inferior image conditions.
TransFusion achieves state-of-the-art performance on large-scale datasets.
We extend the proposed method to the 3D tracking task and achieve the 1st place in the leaderboard of nuScenes tracking.
arXiv Detail & Related papers (2022-03-22T07:15:13Z) - 3D3L: Deep Learned 3D Keypoint Detection and Description for LiDARs [25.73598441491818]
In this publication, we use a state-of-the-art 2D feature network as a basis for 3D3L, exploiting both intensity and depth of LiDAR range images.
Our results show that these keypoints and descriptors extracted from LiDAR scan images outperform state-of-the-art on different benchmark metrics.
arXiv Detail & Related papers (2021-03-25T13:08:07Z) - Learning to Drop Points for LiDAR Scan Synthesis [5.132259673802809]
Generative modeling of 3D scenes is a crucial topic for aiding mobile robots to improve unreliable observations.
Most existing studies on point clouds have focused on small and uniform-density data.
3D LiDAR point clouds widely used in mobile robots are non-trivial to be handled because of the large number of points and varying-density.
This paper proposes a novel framework based on generative adversarial networks to synthesize realistic LiDAR data as an improved 2D representation.
arXiv Detail & Related papers (2021-02-23T21:53:14Z) - Cross-Modality 3D Object Detection [63.29935886648709]
We present a novel two-stage multi-modal fusion network for 3D object detection.
The whole architecture facilitates two-stage fusion.
Our experiments on the KITTI dataset show that the proposed multi-stage fusion helps the network to learn better representations.
arXiv Detail & Related papers (2020-08-16T11:01:20Z) - End-to-End Pseudo-LiDAR for Image-Based 3D Object Detection [62.34374949726333]
Pseudo-LiDAR (PL) has led to a drastic reduction in the accuracy gap between methods based on LiDAR sensors and those based on cheap stereo cameras.
PL combines state-of-the-art deep neural networks for 3D depth estimation with those for 3D object detection by converting 2D depth map outputs to 3D point cloud inputs.
We introduce a new framework based on differentiable Change of Representation (CoR) modules that allow the entire PL pipeline to be trained end-to-end.
arXiv Detail & Related papers (2020-04-07T02:18:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.