ML-PersRef: A Machine Learning-based Personalized Multimodal Fusion
Approach for Referencing Outside Objects From a Moving Vehicle
- URL: http://arxiv.org/abs/2111.02327v1
- Date: Wed, 3 Nov 2021 16:22:17 GMT
- Title: ML-PersRef: A Machine Learning-based Personalized Multimodal Fusion
Approach for Referencing Outside Objects From a Moving Vehicle
- Authors: Amr Gomaa, Guillermo Reyes, Michael Feld
- Abstract summary: We propose a learning-based multimodal fusion approach for referencing outside-the-vehicle objects while maintaining a long driving route in a simulated environment.
We also demonstrate possible ways to exploit behavioral differences between users when completing the referencing task to realize an adaptable personalized system for each driver.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Over the past decades, the addition of hundreds of sensors to modern vehicles
has led to an exponential increase in their capabilities. This allows for novel
approaches to interaction with the vehicle that go beyond traditional
touch-based and voice command approaches, such as emotion recognition, head
rotation, eye gaze, and pointing gestures. Although gaze and pointing gestures
have been used before for referencing objects inside and outside vehicles, the
multimodal interaction and fusion of these gestures have so far not been
extensively studied. We propose a novel learning-based multimodal fusion
approach for referencing outside-the-vehicle objects while maintaining a long
driving route in a simulated environment. The proposed multimodal approaches
outperform single-modality approaches in multiple aspects and conditions.
Moreover, we also demonstrate possible ways to exploit behavioral differences
between users when completing the referencing task to realize an adaptable
personalized system for each driver. We propose a personalization technique
based on the transfer-of-learning concept for exceedingly small data sizes to
enhance prediction and adapt to individualistic referencing behavior. Our code
is publicly available at https://github.com/amr-gomaa/ML-PersRef.
Related papers
- DeepInteraction++: Multi-Modality Interaction for Autonomous Driving [80.8837864849534]
We introduce a novel modality interaction strategy that allows individual per-modality representations to be learned and maintained throughout.
DeepInteraction++ is a multi-modal interaction framework characterized by a multi-modal representational interaction encoder and a multi-modal predictive interaction decoder.
Experiments demonstrate the superior performance of the proposed framework on both 3D object detection and end-to-end autonomous driving tasks.
arXiv Detail & Related papers (2024-08-09T14:04:21Z) - Delving into Multi-modal Multi-task Foundation Models for Road Scene Understanding: From Learning Paradigm Perspectives [56.2139730920855]
We present a systematic analysis of MM-VUFMs specifically designed for road scenes.
Our objective is to provide a comprehensive overview of common practices, referring to task-specific models, unified multi-modal models, unified multi-task models, and foundation model prompting techniques.
We provide insights into key challenges and future trends, such as closed-loop driving systems, interpretability, embodied driving agents, and world models.
arXiv Detail & Related papers (2024-02-05T12:47:09Z) - Looking for a better fit? An Incremental Learning Multimodal Object
Referencing Framework adapting to Individual Drivers [0.0]
The rapid advancement of the automotive industry has rendered traditional methods of vehicle interaction, such as touch-based and voice command systems, inadequate for a widening range of non-driving related tasks, such as referencing objects outside of the vehicle.
We propose textitIcRegress, a novel regression-based incremental learning approach that adapts to changing behavior and the unique characteristics of drivers engaged in the dual task of driving and referencing objects.
arXiv Detail & Related papers (2024-01-29T12:48:56Z) - Drive Anywhere: Generalizable End-to-end Autonomous Driving with
Multi-modal Foundation Models [114.69732301904419]
We present an approach to apply end-to-end open-set (any environment/scene) autonomous driving that is capable of providing driving decisions from representations queryable by image and text.
Our approach demonstrates unparalleled results in diverse tests while achieving significantly greater robustness in out-of-distribution situations.
arXiv Detail & Related papers (2023-10-26T17:56:35Z) - Adaptive User-Centered Multimodal Interaction towards Reliable and
Trusted Automotive Interfaces [0.0]
Hand gestures, head pose, eye gaze, and speech have been investigated in automotive applications for object selection and referencing.
I propose a user-centered adaptive multimodal fusion approach for referencing external objects from a moving vehicle.
arXiv Detail & Related papers (2022-11-07T13:31:00Z) - Multimodal Driver Referencing: A Comparison of Pointing to Objects
Inside and Outside the Vehicle [0.0]
We use multiple modalities to achieve natural human-machine interaction for a specific task.
By tracking the movements of eye-gaze, head and finger, we design a multimodal fusion architecture.
We propose a method to identity whether the driver's referenced object lies inside or outside the vehicle.
arXiv Detail & Related papers (2022-02-15T12:40:13Z) - Learning Interactive Driving Policies via Data-driven Simulation [125.97811179463542]
Data-driven simulators promise high data-efficiency for driving policy learning.
Small underlying datasets often lack interesting and challenging edge cases for learning interactive driving.
We propose a simulation method that uses in-painted ado vehicles for learning robust driving policies.
arXiv Detail & Related papers (2021-11-23T20:14:02Z) - Multimodal Fusion Using Deep Learning Applied to Driver's Referencing of
Outside-Vehicle Objects [0.0]
We utilize deep learning for a multimodal fusion network for referencing objects outside the vehicle.
We use features from gaze, head pose and finger pointing simultaneously to precisely predict the referenced objects in different car poses.
arXiv Detail & Related papers (2021-07-26T12:37:06Z) - Multi-Modal Fusion Transformer for End-to-End Autonomous Driving [59.60483620730437]
We propose TransFuser, a novel Multi-Modal Fusion Transformer, to integrate image and LiDAR representations using attention.
Our approach achieves state-of-the-art driving performance while reducing collisions by 76% compared to geometry-based fusion.
arXiv Detail & Related papers (2021-04-19T11:48:13Z) - Studying Person-Specific Pointing and Gaze Behavior for Multimodal
Referencing of Outside Objects from a Moving Vehicle [58.720142291102135]
Hand pointing and eye gaze have been extensively investigated in automotive applications for object selection and referencing.
Existing outside-the-vehicle referencing methods focus on a static situation, whereas the situation in a moving vehicle is highly dynamic and subject to safety-critical constraints.
We investigate the specific characteristics of each modality and the interaction between them when used in the task of referencing outside objects.
arXiv Detail & Related papers (2020-09-23T14:56:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.