MMRNet: Improving Reliability for Multimodal Object Detection and
Segmentation for Bin Picking via Multimodal Redundancy
- URL: http://arxiv.org/abs/2210.10842v3
- Date: Sun, 7 May 2023 16:04:40 GMT
- Title: MMRNet: Improving Reliability for Multimodal Object Detection and
Segmentation for Bin Picking via Multimodal Redundancy
- Authors: Yuhao Chen, Hayden Gunraj, E. Zhixuan Zeng, Robbie Meyer, Maximilian
Gilles, Alexander Wong
- Abstract summary: We propose a reliable object detection and segmentation system with MultiModal Redundancy (MMRNet)
This is the first system that introduces the concept of multimodal redundancy to address sensor failure issues during deployment.
We present a new label-free multi-modal consistency (MC) score that utilizes the output from all modalities to measure the overall system output reliability and uncertainty.
- Score: 68.7563053122698
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, there has been tremendous interest in industry 4.0 infrastructure
to address labor shortages in global supply chains. Deploying artificial
intelligence-enabled robotic bin picking systems in real world has become
particularly important for reducing stress and physical demands of workers
while increasing speed and efficiency of warehouses. To this end, artificial
intelligence-enabled robotic bin picking systems may be used to automate order
picking, but with the risk of causing expensive damage during an abnormal event
such as sensor failure. As such, reliability becomes a critical factor for
translating artificial intelligence research to real world applications and
products. In this paper, we propose a reliable object detection and
segmentation system with MultiModal Redundancy (MMRNet) for tackling object
detection and segmentation for robotic bin picking using data from different
modalities. This is the first system that introduces the concept of multimodal
redundancy to address sensor failure issues during deployment. In particular,
we realize the multimodal redundancy framework with a gate fusion module and
dynamic ensemble learning. Finally, we present a new label-free multi-modal
consistency (MC) score that utilizes the output from all modalities to measure
the overall system output reliability and uncertainty. Through experiments, we
demonstrate that in an event of missing modality, our system provides a much
more reliable performance compared to baseline models. We also demonstrate that
our MC score is a more reliability indicator for outputs during inference time
compared to the model generated confidence scores that are often
over-confident.
Related papers
- RADAR: Robust Two-stage Modality-incomplete Industrial Anomaly Detection [61.71770293720491]
We propose a novel two-stage Robust modAlity-imcomplete fusing and Detecting frAmewoRk, abbreviated as RADAR.
Our bootstrapping philosophy is to enhance two stages in MIIAD, improving the robustness of the Multimodal Transformer.
Our experimental results demonstrate that the proposed RADAR significantly surpasses conventional MIAD methods in terms of effectiveness and robustness.
arXiv Detail & Related papers (2024-10-02T16:47:55Z) - SAMSA: Efficient Transformer for Many Data Modalities [12.7600763629179]
We propose SAMSA - SAMpling-Self-Attention, a context-aware linear complexity self-attention mechanism.
Our mechanism is based on a differentiable sampling without replacement method we discovered.
SAMSA achieved competitive or even SOTA results on many benchmarks, while being faster in inference, compared to other very specialized models.
arXiv Detail & Related papers (2024-08-10T00:09:06Z) - Efficient Multi-Resolution Fusion for Remote Sensing Data with Label
Uncertainty [0.7832189413179361]
This paper presents a new method for fusing multi-modal and multi-resolution remote sensor data without requiring pixel-level training labels.
We propose a new method based on binary fuzzy measures, which reduces the search space and significantly improves the efficiency of the MIMRF framework.
arXiv Detail & Related papers (2024-02-07T17:34:32Z) - Practical Anomaly Detection over Multivariate Monitoring Metrics for
Online Services [29.37493773435177]
CMAnomaly is an anomaly detection framework on multivariate monitoring metrics based on collaborative machine.
The proposed framework is extensively evaluated with both public data and industrial data collected from a large-scale online service system of Huawei Cloud.
Compared with state-of-the-art baseline models, CMAnomaly achieves an average F1 score of 0.9494, outperforming baselines by 6.77% to 10.68%, and runs 10X to 20X faster.
arXiv Detail & Related papers (2023-08-19T08:08:05Z) - Robust Multimodal Failure Detection for Microservice Systems [32.25907616511765]
AnoFusion is an unsupervised failure detection approach for microservice systems.
It learns the correlation of the heterogeneous multimodal data and integrates a Graph Attention Network (GAT) and Gated Recurrent Unit (GRU)
It achieves the F1-score of 0.857 and 0.922, respectively, outperforming state-of-the-art failure detection approaches.
arXiv Detail & Related papers (2023-05-30T12:39:42Z) - Distributional Instance Segmentation: Modeling Uncertainty and High
Confidence Predictions with Latent-MaskRCNN [77.0623472106488]
In this paper, we explore a class of distributional instance segmentation models using latent codes.
For robotic picking applications, we propose a confidence mask method to achieve the high precision necessary.
We show that our method can significantly reduce critical errors in robotic systems, including our newly released dataset of ambiguous scenes.
arXiv Detail & Related papers (2023-05-03T05:57:29Z) - Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal
Sentiment Analysis [96.46952672172021]
Bi-Bimodal Fusion Network (BBFN) is a novel end-to-end network that performs fusion on pairwise modality representations.
Model takes two bimodal pairs as input due to known information imbalance among modalities.
arXiv Detail & Related papers (2021-07-28T23:33:42Z) - Enhancing Multi-Robot Perception via Learned Data Association [37.866254392010454]
We address the multi-robot collaborative perception problem, specifically in the context of multi-view infilling for distributed semantic segmentation.
We propose the Multi-Agent Infilling Network: an neural architecture that can be deployed to each agent in a robotic swarm.
Specifically, each robot is in charge of locally encoding and decoding visual information, and an neural mechanism allows for an uncertainty-aware and context-based exchange of intermediate features.
arXiv Detail & Related papers (2021-07-01T22:45:26Z) - Anomaly Detection Based on Selection and Weighting in Latent Space [73.01328671569759]
We propose a novel selection-and-weighting-based anomaly detection framework called SWAD.
Experiments on both benchmark and real-world datasets have shown the effectiveness and superiority of SWAD.
arXiv Detail & Related papers (2021-03-08T10:56:38Z) - Integrated Benchmarking and Design for Reproducible and Accessible
Evaluation of Robotic Agents [61.36681529571202]
We describe a new concept for reproducible robotics research that integrates development and benchmarking.
One of the central components of this setup is the Duckietown Autolab, a standardized setup that is itself relatively low-cost and reproducible.
We validate the system by analyzing the repeatability of experiments conducted using the infrastructure and show that there is low variance across different robot hardware and across different remote labs.
arXiv Detail & Related papers (2020-09-09T15:31:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.