Adaptive User-Centered Multimodal Interaction towards Reliable and
Trusted Automotive Interfaces
- URL: http://arxiv.org/abs/2211.03539v1
- Date: Mon, 7 Nov 2022 13:31:00 GMT
- Title: Adaptive User-Centered Multimodal Interaction towards Reliable and
Trusted Automotive Interfaces
- Authors: Amr Gomaa
- Abstract summary: Hand gestures, head pose, eye gaze, and speech have been investigated in automotive applications for object selection and referencing.
I propose a user-centered adaptive multimodal fusion approach for referencing external objects from a moving vehicle.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the recently increasing capabilities of modern vehicles, novel
approaches for interaction emerged that go beyond traditional touch-based and
voice command approaches. Therefore, hand gestures, head pose, eye gaze, and
speech have been extensively investigated in automotive applications for object
selection and referencing. Despite these significant advances, existing
approaches mostly employ a one-model-fits-all approach unsuitable for varying
user behavior and individual differences. Moreover, current referencing
approaches either consider these modalities separately or focus on a stationary
situation, whereas the situation in a moving vehicle is highly dynamic and
subject to safety-critical constraints. In this paper, I propose a research
plan for a user-centered adaptive multimodal fusion approach for referencing
external objects from a moving vehicle. The proposed plan aims to provide an
open-source framework for user-centered adaptation and personalization using
user observations and heuristics, multimodal fusion, clustering,
transfer-of-learning for model adaptation, and continuous learning, moving
towards trusted human-centered artificial intelligence.
Related papers
- Hints of Prompt: Enhancing Visual Representation for Multimodal LLMs in Autonomous Driving [65.04643267731122]
General MLLMs combined with CLIP often struggle to represent driving-specific scenarios accurately.
We propose the Hints of Prompt (HoP) framework, which introduces three key enhancements.
These hints are fused through a Hint Fusion module, enriching visual representations and enhancing multimodal reasoning.
arXiv Detail & Related papers (2024-11-20T06:58:33Z) - Generative AI in Multimodal User Interfaces: Trends, Challenges, and Cross-Platform Adaptability [0.0]
Generative AI emerges as a key driver in reshaping user interfaces.
This paper explores the integration of generative AI in modern user interfaces.
It focuses on multimodal interaction, cross-platform adaptability and dynamic personalization.
arXiv Detail & Related papers (2024-11-15T14:49:58Z) - DeepInteraction++: Multi-Modality Interaction for Autonomous Driving [80.8837864849534]
We introduce a novel modality interaction strategy that allows individual per-modality representations to be learned and maintained throughout.
DeepInteraction++ is a multi-modal interaction framework characterized by a multi-modal representational interaction encoder and a multi-modal predictive interaction decoder.
Experiments demonstrate the superior performance of the proposed framework on both 3D object detection and end-to-end autonomous driving tasks.
arXiv Detail & Related papers (2024-08-09T14:04:21Z) - Looking for a better fit? An Incremental Learning Multimodal Object
Referencing Framework adapting to Individual Drivers [0.0]
The rapid advancement of the automotive industry has rendered traditional methods of vehicle interaction, such as touch-based and voice command systems, inadequate for a widening range of non-driving related tasks, such as referencing objects outside of the vehicle.
We propose textitIcRegress, a novel regression-based incremental learning approach that adapts to changing behavior and the unique characteristics of drivers engaged in the dual task of driving and referencing objects.
arXiv Detail & Related papers (2024-01-29T12:48:56Z) - Interactive Autonomous Navigation with Internal State Inference and
Interactivity Estimation [58.21683603243387]
We propose three auxiliary tasks with relational-temporal reasoning and integrate them into the standard Deep Learning framework.
These auxiliary tasks provide additional supervision signals to infer the behavior patterns other interactive agents.
Our approach achieves robust and state-of-the-art performance in terms of standard evaluation metrics.
arXiv Detail & Related papers (2023-11-27T18:57:42Z) - Drive Anywhere: Generalizable End-to-end Autonomous Driving with
Multi-modal Foundation Models [114.69732301904419]
We present an approach to apply end-to-end open-set (any environment/scene) autonomous driving that is capable of providing driving decisions from representations queryable by image and text.
Our approach demonstrates unparalleled results in diverse tests while achieving significantly greater robustness in out-of-distribution situations.
arXiv Detail & Related papers (2023-10-26T17:56:35Z) - Foundation Models for Decision Making: Problems, Methods, and
Opportunities [124.79381732197649]
Foundation models pretrained on diverse data at scale have demonstrated extraordinary capabilities in a wide range of vision and language tasks.
New paradigms are emerging for training foundation models to interact with other agents and perform long-term reasoning.
Research at the intersection of foundation models and decision making holds tremendous promise for creating powerful new systems.
arXiv Detail & Related papers (2023-03-07T18:44:07Z) - ML-PersRef: A Machine Learning-based Personalized Multimodal Fusion
Approach for Referencing Outside Objects From a Moving Vehicle [0.0]
We propose a learning-based multimodal fusion approach for referencing outside-the-vehicle objects while maintaining a long driving route in a simulated environment.
We also demonstrate possible ways to exploit behavioral differences between users when completing the referencing task to realize an adaptable personalized system for each driver.
arXiv Detail & Related papers (2021-11-03T16:22:17Z) - Multimodal Fusion Using Deep Learning Applied to Driver's Referencing of
Outside-Vehicle Objects [0.0]
We utilize deep learning for a multimodal fusion network for referencing objects outside the vehicle.
We use features from gaze, head pose and finger pointing simultaneously to precisely predict the referenced objects in different car poses.
arXiv Detail & Related papers (2021-07-26T12:37:06Z) - Studying Person-Specific Pointing and Gaze Behavior for Multimodal
Referencing of Outside Objects from a Moving Vehicle [58.720142291102135]
Hand pointing and eye gaze have been extensively investigated in automotive applications for object selection and referencing.
Existing outside-the-vehicle referencing methods focus on a static situation, whereas the situation in a moving vehicle is highly dynamic and subject to safety-critical constraints.
We investigate the specific characteristics of each modality and the interaction between them when used in the task of referencing outside objects.
arXiv Detail & Related papers (2020-09-23T14:56:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.