Multimodal Fusion Using Deep Learning Applied to Driver's Referencing of
Outside-Vehicle Objects
- URL: http://arxiv.org/abs/2107.12167v1
- Date: Mon, 26 Jul 2021 12:37:06 GMT
- Title: Multimodal Fusion Using Deep Learning Applied to Driver's Referencing of
Outside-Vehicle Objects
- Authors: Abdul Rafey Aftab, Michael von der Beeck, Steven Rohrhirsch, Benoit
Diotte, Michael Feld
- Abstract summary: We utilize deep learning for a multimodal fusion network for referencing objects outside the vehicle.
We use features from gaze, head pose and finger pointing simultaneously to precisely predict the referenced objects in different car poses.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: There is a growing interest in more intelligent natural user interaction with
the car. Hand gestures and speech are already being applied for driver-car
interaction. Moreover, multimodal approaches are also showing promise in the
automotive industry. In this paper, we utilize deep learning for a multimodal
fusion network for referencing objects outside the vehicle. We use features
from gaze, head pose and finger pointing simultaneously to precisely predict
the referenced objects in different car poses. We demonstrate the practical
limitations of each modality when used for a natural form of referencing,
specifically inside the car. As evident from our results, we overcome the
modality specific limitations, to a large extent, by the addition of other
modalities. This work highlights the importance of multimodal sensing,
especially when moving towards natural user interaction. Furthermore, our user
based analysis shows noteworthy differences in recognition of user behavior
depending upon the vehicle pose.
Related papers
- DeepInteraction++: Multi-Modality Interaction for Autonomous Driving [80.8837864849534]
We introduce a novel modality interaction strategy that allows individual per-modality representations to be learned and maintained throughout.
DeepInteraction++ is a multi-modal interaction framework characterized by a multi-modal representational interaction encoder and a multi-modal predictive interaction decoder.
Experiments demonstrate the superior performance of the proposed framework on both 3D object detection and end-to-end autonomous driving tasks.
arXiv Detail & Related papers (2024-08-09T14:04:21Z) - Looking for a better fit? An Incremental Learning Multimodal Object
Referencing Framework adapting to Individual Drivers [0.0]
The rapid advancement of the automotive industry has rendered traditional methods of vehicle interaction, such as touch-based and voice command systems, inadequate for a widening range of non-driving related tasks, such as referencing objects outside of the vehicle.
We propose textitIcRegress, a novel regression-based incremental learning approach that adapts to changing behavior and the unique characteristics of drivers engaged in the dual task of driving and referencing objects.
arXiv Detail & Related papers (2024-01-29T12:48:56Z) - Drive Anywhere: Generalizable End-to-end Autonomous Driving with
Multi-modal Foundation Models [114.69732301904419]
We present an approach to apply end-to-end open-set (any environment/scene) autonomous driving that is capable of providing driving decisions from representations queryable by image and text.
Our approach demonstrates unparalleled results in diverse tests while achieving significantly greater robustness in out-of-distribution situations.
arXiv Detail & Related papers (2023-10-26T17:56:35Z) - Adaptive User-Centered Multimodal Interaction towards Reliable and
Trusted Automotive Interfaces [0.0]
Hand gestures, head pose, eye gaze, and speech have been investigated in automotive applications for object selection and referencing.
I propose a user-centered adaptive multimodal fusion approach for referencing external objects from a moving vehicle.
arXiv Detail & Related papers (2022-11-07T13:31:00Z) - Multimodal Driver Referencing: A Comparison of Pointing to Objects
Inside and Outside the Vehicle [0.0]
We use multiple modalities to achieve natural human-machine interaction for a specific task.
By tracking the movements of eye-gaze, head and finger, we design a multimodal fusion architecture.
We propose a method to identity whether the driver's referenced object lies inside or outside the vehicle.
arXiv Detail & Related papers (2022-02-15T12:40:13Z) - ML-PersRef: A Machine Learning-based Personalized Multimodal Fusion
Approach for Referencing Outside Objects From a Moving Vehicle [0.0]
We propose a learning-based multimodal fusion approach for referencing outside-the-vehicle objects while maintaining a long driving route in a simulated environment.
We also demonstrate possible ways to exploit behavioral differences between users when completing the referencing task to realize an adaptable personalized system for each driver.
arXiv Detail & Related papers (2021-11-03T16:22:17Z) - SMARTS: Scalable Multi-Agent Reinforcement Learning Training School for
Autonomous Driving [96.50297622371457]
Multi-agent interaction is a fundamental aspect of autonomous driving in the real world.
Despite more than a decade of research and development, the problem of how to interact with diverse road users in diverse scenarios remains largely unsolved.
We develop a dedicated simulation platform called SMARTS that generates diverse and competent driving interactions.
arXiv Detail & Related papers (2020-10-19T18:26:10Z) - Studying Person-Specific Pointing and Gaze Behavior for Multimodal
Referencing of Outside Objects from a Moving Vehicle [58.720142291102135]
Hand pointing and eye gaze have been extensively investigated in automotive applications for object selection and referencing.
Existing outside-the-vehicle referencing methods focus on a static situation, whereas the situation in a moving vehicle is highly dynamic and subject to safety-critical constraints.
We investigate the specific characteristics of each modality and the interaction between them when used in the task of referencing outside objects.
arXiv Detail & Related papers (2020-09-23T14:56:19Z) - V2VNet: Vehicle-to-Vehicle Communication for Joint Perception and
Prediction [74.42961817119283]
We use vehicle-to-vehicle (V2V) communication to improve the perception and motion forecasting performance of self-driving vehicles.
By intelligently aggregating the information received from multiple nearby vehicles, we can observe the same scene from different viewpoints.
arXiv Detail & Related papers (2020-08-17T17:58:26Z) - Explicit Domain Adaptation with Loosely Coupled Samples [85.9511585604837]
We propose a transfer learning framework, core of which is learning an explicit mapping between domains.
Due to its interpretability, this is beneficial for safety-critical applications, like autonomous driving.
arXiv Detail & Related papers (2020-04-24T21:23:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.