Multimodal Driver Referencing: A Comparison of Pointing to Objects
Inside and Outside the Vehicle
- URL: http://arxiv.org/abs/2202.07360v1
- Date: Tue, 15 Feb 2022 12:40:13 GMT
- Title: Multimodal Driver Referencing: A Comparison of Pointing to Objects
Inside and Outside the Vehicle
- Authors: Abdul Rafey Aftab, Michael von der Beeck
- Abstract summary: We use multiple modalities to achieve natural human-machine interaction for a specific task.
By tracking the movements of eye-gaze, head and finger, we design a multimodal fusion architecture.
We propose a method to identity whether the driver's referenced object lies inside or outside the vehicle.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Advanced in-cabin sensing technologies, especially vision based approaches,
have tremendously progressed user interaction inside the vehicle, paving the
way for new applications of natural user interaction. Just as humans use
multiple modes to communicate with each other, we follow an approach which is
characterized by simultaneously using multiple modalities to achieve natural
human-machine interaction for a specific task: pointing to or glancing towards
objects inside as well as outside the vehicle for deictic references. By
tracking the movements of eye-gaze, head and finger, we design a multimodal
fusion architecture using a deep neural network to precisely identify the
driver's referencing intent. Additionally, we use a speech command as a trigger
to separate each referencing event. We observe differences in driver behavior
in the two pointing use cases (i.e. for inside and outside objects), especially
when analyzing the preciseness of the three modalities eye, head, and finger.
We conclude that there is no single modality that is solely optimal for all
cases as each modality reveals certain limitations. Fusion of multiple
modalities exploits the relevant characteristics of each modality, hence
overcoming the case dependent limitations of each individual modality.
Ultimately, we propose a method to identity whether the driver's referenced
object lies inside or outside the vehicle, based on the predicted pointing
direction.
Related papers
- DeepInteraction++: Multi-Modality Interaction for Autonomous Driving [80.8837864849534]
We introduce a novel modality interaction strategy that allows individual per-modality representations to be learned and maintained throughout.
DeepInteraction++ is a multi-modal interaction framework characterized by a multi-modal representational interaction encoder and a multi-modal predictive interaction decoder.
Experiments demonstrate the superior performance of the proposed framework on both 3D object detection and end-to-end autonomous driving tasks.
arXiv Detail & Related papers (2024-08-09T14:04:21Z) - Looking for a better fit? An Incremental Learning Multimodal Object
Referencing Framework adapting to Individual Drivers [0.0]
The rapid advancement of the automotive industry has rendered traditional methods of vehicle interaction, such as touch-based and voice command systems, inadequate for a widening range of non-driving related tasks, such as referencing objects outside of the vehicle.
We propose textitIcRegress, a novel regression-based incremental learning approach that adapts to changing behavior and the unique characteristics of drivers engaged in the dual task of driving and referencing objects.
arXiv Detail & Related papers (2024-01-29T12:48:56Z) - Promptable Behaviors: Personalizing Multi-Objective Rewards from Human
Preferences [53.353022588751585]
We present Promptable Behaviors, a novel framework that facilitates efficient personalization of robotic agents to diverse human preferences.
We introduce three distinct methods to infer human preferences by leveraging different types of interactions.
We evaluate the proposed method in personalized object-goal navigation and flee navigation tasks in ProcTHOR and RoboTHOR.
arXiv Detail & Related papers (2023-12-14T21:00:56Z) - Drive Anywhere: Generalizable End-to-end Autonomous Driving with
Multi-modal Foundation Models [114.69732301904419]
We present an approach to apply end-to-end open-set (any environment/scene) autonomous driving that is capable of providing driving decisions from representations queryable by image and text.
Our approach demonstrates unparalleled results in diverse tests while achieving significantly greater robustness in out-of-distribution situations.
arXiv Detail & Related papers (2023-10-26T17:56:35Z) - A Spatio-Temporal Multilayer Perceptron for Gesture Recognition [70.34489104710366]
We propose a multilayer state-weighted perceptron for gesture recognition in the context of autonomous vehicles.
An evaluation of TCG and Drive&Act datasets is provided to showcase the promising performance of our approach.
We deploy our model to our autonomous vehicle to show its real-time capability and stable execution.
arXiv Detail & Related papers (2022-04-25T08:42:47Z) - ML-PersRef: A Machine Learning-based Personalized Multimodal Fusion
Approach for Referencing Outside Objects From a Moving Vehicle [0.0]
We propose a learning-based multimodal fusion approach for referencing outside-the-vehicle objects while maintaining a long driving route in a simulated environment.
We also demonstrate possible ways to exploit behavioral differences between users when completing the referencing task to realize an adaptable personalized system for each driver.
arXiv Detail & Related papers (2021-11-03T16:22:17Z) - Multimodal Fusion Using Deep Learning Applied to Driver's Referencing of
Outside-Vehicle Objects [0.0]
We utilize deep learning for a multimodal fusion network for referencing objects outside the vehicle.
We use features from gaze, head pose and finger pointing simultaneously to precisely predict the referenced objects in different car poses.
arXiv Detail & Related papers (2021-07-26T12:37:06Z) - Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data
Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images.
Our approach is fully automatic without any human interaction.
We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z) - Studying Person-Specific Pointing and Gaze Behavior for Multimodal
Referencing of Outside Objects from a Moving Vehicle [58.720142291102135]
Hand pointing and eye gaze have been extensively investigated in automotive applications for object selection and referencing.
Existing outside-the-vehicle referencing methods focus on a static situation, whereas the situation in a moving vehicle is highly dynamic and subject to safety-critical constraints.
We investigate the specific characteristics of each modality and the interaction between them when used in the task of referencing outside objects.
arXiv Detail & Related papers (2020-09-23T14:56:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.