Multimodal Object Detection using Depth and Image Data for Manufacturing Parts
- URL: http://arxiv.org/abs/2411.09062v1
- Date: Wed, 13 Nov 2024 22:43:15 GMT
- Title: Multimodal Object Detection using Depth and Image Data for Manufacturing Parts
- Authors: Nazanin Mahjourian, Vinh Nguyen,
- Abstract summary: This work proposes a multi-sensor system combining an red-green-blue (RGB) camera and a 3D point cloud sensor.
A novel multimodal object detection method is developed to process both RGB and depth data.
The results show that the multimodal model significantly outperforms the depth-only and RGB-only baselines on established object detection metrics.
- Score: 1.0819408603463427
- License:
- Abstract: Manufacturing requires reliable object detection methods for precise picking and handling of diverse types of manufacturing parts and components. Traditional object detection methods utilize either only 2D images from cameras or 3D data from lidars or similar 3D sensors. However, each of these sensors have weaknesses and limitations. Cameras do not have depth perception and 3D sensors typically do not carry color information. These weaknesses can undermine the reliability and robustness of industrial manufacturing systems. To address these challenges, this work proposes a multi-sensor system combining an red-green-blue (RGB) camera and a 3D point cloud sensor. The two sensors are calibrated for precise alignment of the multimodal data captured from the two hardware devices. A novel multimodal object detection method is developed to process both RGB and depth data. This object detector is based on the Faster R-CNN baseline that was originally designed to process only camera images. The results show that the multimodal model significantly outperforms the depth-only and RGB-only baselines on established object detection metrics. More specifically, the multimodal model improves mAP by 13% and raises Mean Precision by 11.8% in comparison to the RGB-only baseline. Compared to the depth-only baseline, it improves mAP by 78% and raises Mean Precision by 57%. Hence, this method facilitates more reliable and robust object detection in service to smart manufacturing applications.
Related papers
- Performance Assessment of Feature Detection Methods for 2-D FS Sonar Imagery [11.23455335391121]
Key challenges include non-uniform lighting and poor visibility in turbid environments.
High-frequency forward-look sonar cameras address these issues.
We evaluate a number of feature detectors using real sonar images from five different sonar devices.
arXiv Detail & Related papers (2024-09-11T04:35:07Z) - Sparse Points to Dense Clouds: Enhancing 3D Detection with Limited LiDAR Data [68.18735997052265]
We propose a balanced approach that combines the advantages of monocular and point cloud-based 3D detection.
Our method requires only a small number of 3D points, that can be obtained from a low-cost, low-resolution sensor.
The accuracy of 3D detection improves by 20% compared to the state-of-the-art monocular detection methods.
arXiv Detail & Related papers (2024-04-10T03:54:53Z) - Joint object detection and re-identification for 3D obstacle
multi-camera systems [47.87501281561605]
This research paper introduces a novel modification to an object detection network that uses camera and lidar information.
It incorporates an additional branch designed for the task of re-identifying objects across adjacent cameras within the same vehicle.
The results underscore the superiority of this method over traditional Non-Maximum Suppression (NMS) techniques.
arXiv Detail & Related papers (2023-10-09T15:16:35Z) - Artifacts Mapping: Multi-Modal Semantic Mapping for Object Detection and
3D Localization [13.473742114288616]
We propose a framework that can autonomously detect and localize objects in a known environment.
The framework consists of three key elements: understanding the environment through RGB data, estimating depth through multi-modal sensor fusion, and managing artifacts.
Experiments show that the proposed framework can accurately detect 98% of the objects in the real sample environment, without post-processing.
arXiv Detail & Related papers (2023-07-03T15:51:39Z) - Multi-Modal 3D Object Detection by Box Matching [109.43430123791684]
We propose a novel Fusion network by Box Matching (FBMNet) for multi-modal 3D detection.
With the learned assignments between 3D and 2D object proposals, the fusion for detection can be effectively performed by combing their ROI features.
arXiv Detail & Related papers (2023-05-12T18:08:51Z) - FloatingFusion: Depth from ToF and Image-stabilized Stereo Cameras [37.812681878193914]
smartphones now have multimodal camera systems with time-of-flight (ToF) depth sensors and multiple color cameras.
producing accurate high-resolution depth is still challenging due to the low resolution and limited active illumination power of ToF sensors.
We propose an automatic calibration technique based on dense 2D/3D matching that can estimate camera parameters from a single snapshot.
arXiv Detail & Related papers (2022-10-06T09:57:09Z) - DELTAR: Depth Estimation from a Light-weight ToF Sensor and RGB Image [39.389538555506256]
We propose DELTAR, a novel method to empower light-weight ToF sensors with the capability of measuring high resolution and accurate depth.
As the core of DELTAR, a feature extractor customized for depth distribution and an attention-based neural architecture is proposed to fuse the information from the color and ToF domain efficiently.
Experiments show that our method produces more accurate depth than existing frameworks designed for depth completion and depth super-resolution and achieves on par performance with a commodity-level RGB-D sensor.
arXiv Detail & Related papers (2022-09-27T13:11:37Z) - Learning Online Multi-Sensor Depth Fusion [100.84519175539378]
SenFuNet is a depth fusion approach that learns sensor-specific noise and outlier statistics.
We conduct experiments with various sensor combinations on the real-world CoRBS and Scene3D datasets.
arXiv Detail & Related papers (2022-04-07T10:45:32Z) - Learning Enriched Illuminants for Cross and Single Sensor Color
Constancy [182.4997117953705]
We propose cross-sensor self-supervised training to train the network.
We train the network by randomly sampling the artificial illuminants in a sensor-independent manner.
Experiments show that our cross-sensor model and single-sensor model outperform other state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2022-03-21T15:45:35Z) - Joint Learning of Salient Object Detection, Depth Estimation and Contour
Extraction [91.43066633305662]
We propose a novel multi-task and multi-modal filtered transformer (MMFT) network for RGB-D salient object detection (SOD)
Specifically, we unify three complementary tasks: depth estimation, salient object detection and contour estimation. The multi-task mechanism promotes the model to learn the task-aware features from the auxiliary tasks.
Experiments show that it not only significantly surpasses the depth-based RGB-D SOD methods on multiple datasets, but also precisely predicts a high-quality depth map and salient contour at the same time.
arXiv Detail & Related papers (2022-03-09T17:20:18Z) - CalibDNN: Multimodal Sensor Calibration for Perception Using Deep Neural
Networks [27.877734292570967]
We propose a novel deep learning-driven technique (CalibDNN) for accurate calibration among multimodal sensor, specifically LiDAR-Camera pairs.
The entire processing is fully automatic with a single model and single iteration.
Results comparison among different methods and extensive experiments on different datasets demonstrates the state-of-the-art performance.
arXiv Detail & Related papers (2021-03-27T02:43:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.