Ensemble Learning for Fusion of Multiview Vision with Occlusion and
Missing Information: Framework and Evaluations with Real-World Data and
Applications in Driver Hand Activity Recognition
- URL: http://arxiv.org/abs/2301.12592v2
- Date: Fri, 29 Sep 2023 02:24:34 GMT
- Title: Ensemble Learning for Fusion of Multiview Vision with Occlusion and
Missing Information: Framework and Evaluations with Real-World Data and
Applications in Driver Hand Activity Recognition
- Authors: Ross Greer, Mohan Trivedi
- Abstract summary: Multi-sensor frameworks provide opportunities for ensemble learning and sensor fusion.
We propose and analyze an imputation scheme to handle missing information.
We show that a late-fusion approach between parallel convolutional neural networks can outperform even the best-placed single camera model.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Multi-sensor frameworks provide opportunities for ensemble learning and
sensor fusion to make use of redundancy and supplemental information, helpful
in real-world safety applications such as continuous driver state monitoring
which necessitate predictions even in cases where information may be
intermittently missing. We define this problem of intermittent instances of
missing information (by occlusion, noise, or sensor failure) and design a
learning framework around these data gaps, proposing and analyzing an
imputation scheme to handle missing information. We apply these ideas to tasks
in camera-based hand activity classification for robust safety during
autonomous driving. We show that a late-fusion approach between parallel
convolutional neural networks can outperform even the best-placed single camera
model in estimating the hands' held objects and positions when validated on
within-group subjects, and that our multi-camera framework performs best on
average in cross-group validation, and that the fusion approach outperforms
ensemble weighted majority and model combination schemes.
Related papers
- CRASH: Crash Recognition and Anticipation System Harnessing with Context-Aware and Temporal Focus Attentions [13.981748780317329]
Accurately and promptly predicting accidents among surrounding traffic agents from camera footage is crucial for the safety of autonomous vehicles (AVs)
This study introduces a novel accident anticipation framework for AVs, termed CRASH.
It seamlessly integrates five components: object detector, feature extractor, object-aware module, context-aware module, and multi-layer fusion.
Our model surpasses existing top baselines in critical evaluation metrics like Average Precision (AP) and mean Time-To-Accident (mTTA)
arXiv Detail & Related papers (2024-07-25T04:12:49Z) - Continual Road-Scene Semantic Segmentation via Feature-Aligned Symmetric Multi-Modal Network [15.196758664999455]
We re-frame the task of multimodal semantic segmentation by enforcing a tightly coupled feature representation and a symmetric information-sharing scheme.
We also introduce an ad-hoc class-incremental continual learning scheme, proving our approach's effectiveness and reliability even in safety-critical settings.
arXiv Detail & Related papers (2023-08-09T04:46:16Z) - Learning Unseen Modality Interaction [54.23533023883659]
Multimodal learning assumes all modality combinations of interest are available during training to learn cross-modal correspondences.
We pose the problem of unseen modality interaction and introduce a first solution.
It exploits a module that projects the multidimensional features of different modalities into a common space with rich information preserved.
arXiv Detail & Related papers (2023-06-22T10:53:10Z) - MMRNet: Improving Reliability for Multimodal Object Detection and
Segmentation for Bin Picking via Multimodal Redundancy [68.7563053122698]
We propose a reliable object detection and segmentation system with MultiModal Redundancy (MMRNet)
This is the first system that introduces the concept of multimodal redundancy to address sensor failure issues during deployment.
We present a new label-free multi-modal consistency (MC) score that utilizes the output from all modalities to measure the overall system output reliability and uncertainty.
arXiv Detail & Related papers (2022-10-19T19:15:07Z) - Safety-Enhanced Autonomous Driving Using Interpretable Sensor Fusion
Transformer [28.15612357340141]
We propose a safety-enhanced autonomous driving framework, named Interpretable Sensor Fusion Transformer(InterFuser)
We process and fuse information from multi-modal multi-view sensors for achieving comprehensive scene understanding and adversarial event detection.
Our framework provides more semantics and are exploited to better constrain actions to be within the safe sets.
arXiv Detail & Related papers (2022-07-28T11:36:21Z) - Federated Deep Learning Meets Autonomous Vehicle Perception: Design and
Verification [168.67190934250868]
Federated learning empowered connected autonomous vehicle (FLCAV) has been proposed.
FLCAV preserves privacy while reducing communication and annotation costs.
It is challenging to determine the network resources and road sensor poses for multi-stage training.
arXiv Detail & Related papers (2022-06-03T23:55:45Z) - Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object
Detection [58.81316192862618]
Two critical sensors for 3D perception in autonomous driving are the camera and the LiDAR.
fusing these two modalities can significantly boost the performance of 3D perception models.
We benchmark the state-of-the-art fusion methods for the first time.
arXiv Detail & Related papers (2022-05-30T09:35:37Z) - Self-supervised Human Detection and Segmentation via Multi-view
Consensus [116.92405645348185]
We propose a multi-camera framework in which geometric constraints are embedded in the form of multi-view consistency during training.
We show that our approach outperforms state-of-the-art self-supervised person detection and segmentation techniques on images that visually depart from those of standard benchmarks.
arXiv Detail & Related papers (2020-12-09T15:47:21Z) - SoDA: Multi-Object Tracking with Soft Data Association [75.39833486073597]
Multi-object tracking (MOT) is a prerequisite for a safe deployment of self-driving cars.
We propose a novel approach to MOT that uses attention to compute track embeddings that encode dependencies between observed objects.
arXiv Detail & Related papers (2020-08-18T03:40:25Z) - Federated Self-Supervised Learning of Multi-Sensor Representations for
Embedded Intelligence [8.110949636804772]
Smartphones, wearables, and Internet of Things (IoT) devices produce a wealth of data that cannot be accumulated in a centralized repository for learning supervised models.
We propose a self-supervised approach termed textitscalogram-signal correspondence learning based on wavelet transform to learn useful representations from unlabeled sensor inputs.
We extensively assess the quality of learned features with our multi-view strategy on diverse public datasets, achieving strong performance in all domains.
arXiv Detail & Related papers (2020-07-25T21:59:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.