Related papers: Multimodal Driver Referencing: A Comparison of Pointing to Objects Inside and Outside the Vehicle

Multimodal Driver Referencing: A Comparison of Pointing to Objects Inside and Outside the Vehicle

URL: http://arxiv.org/abs/2202.07360v1
Date: Tue, 15 Feb 2022 12:40:13 GMT
Title: Multimodal Driver Referencing: A Comparison of Pointing to Objects Inside and Outside the Vehicle
Authors: Abdul Rafey Aftab, Michael von der Beeck
Abstract summary: We use multiple modalities to achieve natural human-machine interaction for a specific task. By tracking the movements of eye-gaze, head and finger, we design a multimodal fusion architecture. We propose a method to identity whether the driver's referenced object lies inside or outside the vehicle.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Advanced in-cabin sensing technologies, especially vision based approaches, have tremendously progressed user interaction inside the vehicle, paving the way for new applications of natural user interaction. Just as humans use multiple modes to communicate with each other, we follow an approach which is characterized by simultaneously using multiple modalities to achieve natural human-machine interaction for a specific task: pointing to or glancing towards objects inside as well as outside the vehicle for deictic references. By tracking the movements of eye-gaze, head and finger, we design a multimodal fusion architecture using a deep neural network to precisely identify the driver's referencing intent. Additionally, we use a speech command as a trigger to separate each referencing event. We observe differences in driver behavior in the two pointing use cases (i.e. for inside and outside objects), especially when analyzing the preciseness of the three modalities eye, head, and finger. We conclude that there is no single modality that is solely optimal for all cases as each modality reveals certain limitations. Fusion of multiple modalities exploits the relevant characteristics of each modality, hence overcoming the case dependent limitations of each individual modality. Ultimately, we propose a method to identity whether the driver's referenced object lies inside or outside the vehicle, based on the predicted pointing direction.

Related papers

PointArena: Probing Multimodal Grounding Through Language-Guided Pointing [79.80132157576978]
Pointing serves as a fundamental and intuitive mechanism for grounding language within visual contexts.<n>We introduce PointArena, a comprehensive platform for evaluating multimodal pointing across diverse reasoning scenarios.
arXiv Detail & Related papers (2025-05-15T06:04:42Z)
DeepInteraction++: Multi-Modality Interaction for Autonomous Driving [80.8837864849534]
We introduce a novel modality interaction strategy that allows individual per-modality representations to be learned and maintained throughout. DeepInteraction++ is a multi-modal interaction framework characterized by a multi-modal representational interaction encoder and a multi-modal predictive interaction decoder. Experiments demonstrate the superior performance of the proposed framework on both 3D object detection and end-to-end autonomous driving tasks.
arXiv Detail & Related papers (2024-08-09T14:04:21Z)
Looking for a better fit? An Incremental Learning Multimodal Object Referencing Framework adapting to Individual Drivers [0.0]
The rapid advancement of the automotive industry has rendered traditional methods of vehicle interaction, such as touch-based and voice command systems, inadequate for a widening range of non-driving related tasks, such as referencing objects outside of the vehicle. We propose textitIcRegress, a novel regression-based incremental learning approach that adapts to changing behavior and the unique characteristics of drivers engaged in the dual task of driving and referencing objects.
arXiv Detail & Related papers (2024-01-29T12:48:56Z)
Promptable Behaviors: Personalizing Multi-Objective Rewards from Human Preferences [53.353022588751585]
We present Promptable Behaviors, a novel framework that facilitates efficient personalization of robotic agents to diverse human preferences. We introduce three distinct methods to infer human preferences by leveraging different types of interactions. We evaluate the proposed method in personalized object-goal navigation and flee navigation tasks in ProcTHOR and RoboTHOR.
arXiv Detail & Related papers (2023-12-14T21:00:56Z)
Drive Anywhere: Generalizable End-to-end Autonomous Driving with Multi-modal Foundation Models [114.69732301904419]
We present an approach to apply end-to-end open-set (any environment/scene) autonomous driving that is capable of providing driving decisions from representations queryable by image and text. Our approach demonstrates unparalleled results in diverse tests while achieving significantly greater robustness in out-of-distribution situations.
arXiv Detail & Related papers (2023-10-26T17:56:35Z)
A Spatio-Temporal Multilayer Perceptron for Gesture Recognition [70.34489104710366]
We propose a multilayer state-weighted perceptron for gesture recognition in the context of autonomous vehicles. An evaluation of TCG and Drive&Act datasets is provided to showcase the promising performance of our approach. We deploy our model to our autonomous vehicle to show its real-time capability and stable execution.
arXiv Detail & Related papers (2022-04-25T08:42:47Z)
ML-PersRef: A Machine Learning-based Personalized Multimodal Fusion Approach for Referencing Outside Objects From a Moving Vehicle [0.0]
We propose a learning-based multimodal fusion approach for referencing outside-the-vehicle objects while maintaining a long driving route in a simulated environment. We also demonstrate possible ways to exploit behavioral differences between users when completing the referencing task to realize an adaptable personalized system for each driver.
arXiv Detail & Related papers (2021-11-03T16:22:17Z)
Multimodal Fusion Using Deep Learning Applied to Driver's Referencing of Outside-Vehicle Objects [0.0]
We utilize deep learning for a multimodal fusion network for referencing objects outside the vehicle. We use features from gaze, head pose and finger pointing simultaneously to precisely predict the referenced objects in different car poses.
arXiv Detail & Related papers (2021-07-26T12:37:06Z)
Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images. Our approach is fully automatic without any human interaction. We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z)
Studying Person-Specific Pointing and Gaze Behavior for Multimodal Referencing of Outside Objects from a Moving Vehicle [58.720142291102135]
Hand pointing and eye gaze have been extensively investigated in automotive applications for object selection and referencing. Existing outside-the-vehicle referencing methods focus on a static situation, whereas the situation in a moving vehicle is highly dynamic and subject to safety-critical constraints. We investigate the specific characteristics of each modality and the interaction between them when used in the task of referencing outside objects.
arXiv Detail & Related papers (2020-09-23T14:56:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.