Visual Attention Prediction Improves Performance of Autonomous Drone
Racing Agents
- URL: http://arxiv.org/abs/2201.02569v2
- Date: Mon, 10 Jan 2022 13:55:23 GMT
- Title: Visual Attention Prediction Improves Performance of Autonomous Drone
Racing Agents
- Authors: Christian Pfeiffer, Simon Wengeler, Antonio Loquercio, Davide
Scaramuzza
- Abstract summary: Humans race drones faster than neural networks trained for end-to-end autonomous flight.
This work investigates whether neural networks capable of imitating human eye gaze behavior and attention can improve neural network performance.
- Score: 45.36060508554703
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Humans race drones faster than neural networks trained for end-to-end
autonomous flight. This may be related to the ability of human pilots to select
task-relevant visual information effectively. This work investigates whether
neural networks capable of imitating human eye gaze behavior and attention can
improve neural network performance for the challenging task of vision-based
autonomous drone racing. We hypothesize that gaze-based attention prediction
can be an efficient mechanism for visual information selection and decision
making in a simulator-based drone racing task. We test this hypothesis using
eye gaze and flight trajectory data from 18 human drone pilots to train a
visual attention prediction model. We then use this visual attention prediction
model to train an end-to-end controller for vision-based autonomous drone
racing using imitation learning. We compare the drone racing performance of the
attention-prediction controller to those using raw image inputs and image-based
abstractions (i.e., feature tracks). Our results show that attention-prediction
based controllers outperform the baselines and are able to complete a
challenging race track consistently with up to 88% success rate. Furthermore,
visual attention-prediction and feature-track based models showed better
generalization performance than image-based models when evaluated on hold-out
reference trajectories. Our results demonstrate that human visual attention
prediction improves the performance of autonomous vision-based drone racing
agents and provides an essential step towards vision-based, fast, and agile
autonomous flight that eventually can reach and even exceed human performances.
Related papers
- VANP: Learning Where to See for Navigation with Self-Supervised Vision-Action Pre-Training [8.479135285935113]
Humans excel at efficiently navigating through crowds without collision by focusing on specific visual regions relevant to navigation.
Most robotic visual navigation methods rely on deep learning models pre-trained on vision tasks, which prioritize salient objects.
We propose a Self-Supervised Vision-Action Model for Visual Navigation Pre-Training (VANP)
arXiv Detail & Related papers (2024-03-12T22:33:08Z) - Humanoid Locomotion as Next Token Prediction [84.21335675130021]
Our model is a causal transformer trained via autoregressive prediction of sensorimotor trajectories.
We show that our model enables a full-sized humanoid to walk in San Francisco zero-shot.
Our model can transfer to the real world even when trained on only 27 hours of walking data, and can generalize commands not seen during training like walking backward.
arXiv Detail & Related papers (2024-02-29T18:57:37Z) - BAT: Behavior-Aware Human-Like Trajectory Prediction for Autonomous
Driving [24.123577277806135]
We pioneer a novel behavior-aware trajectory prediction model (BAT)
Our model consists of behavior-aware, interaction-aware, priority-aware, and position-aware modules.
We evaluate BAT's performance across the Next Generation Simulation (NGSIM), Highway Drone (HighD), Roundabout Drone (RounD), and Macao Connected Autonomous Driving (MoCAD) datasets.
arXiv Detail & Related papers (2023-12-11T13:27:51Z) - Robot Learning with Sensorimotor Pre-training [98.7755895548928]
We present a self-supervised sensorimotor pre-training approach for robotics.
Our model, called RPT, is a Transformer that operates on sequences of sensorimotor tokens.
We find that sensorimotor pre-training consistently outperforms training from scratch, has favorable scaling properties, and enables transfer across different tasks, environments, and robots.
arXiv Detail & Related papers (2023-06-16T17:58:10Z) - Fully neuromorphic vision and control for autonomous drone flight [5.358212984063069]
Event-based vision and spiking neural hardware promises to exhibit similar characteristics.
Here, we present a fully learned neuromorphic pipeline for controlling a drone flying.
Results illustrate the potential of neuromorphic sensing and processing for enabling smaller network per flight.
arXiv Detail & Related papers (2023-03-15T17:19:45Z) - Towards Cooperative Flight Control Using Visual-Attention [61.99121057062421]
We propose a vision-based air-guardian system to enable parallel autonomy between a pilot and a control system.
Our attention-based air-guardian system can balance the trade-off between its level of involvement in the flight and the pilot's expertise and attention.
arXiv Detail & Related papers (2022-12-21T15:31:47Z) - Learning Deep Sensorimotor Policies for Vision-based Autonomous Drone
Racing [52.50284630866713]
Existing systems often require hand-engineered components for state estimation, planning, and control.
This paper tackles the vision-based autonomous-drone-racing problem by learning deep sensorimotor policies.
arXiv Detail & Related papers (2022-10-26T19:03:17Z) - Masked Visual Pre-training for Motor Control [118.18189211080225]
Self-supervised visual pre-training from real-world images is effective for learning motor control tasks from pixels.
We freeze the visual encoder and train neural network controllers on top with reinforcement learning.
This is the first self-supervised model to exploit real-world images at scale for motor control.
arXiv Detail & Related papers (2022-03-11T18:58:10Z) - Physion: Evaluating Physical Prediction from Vision in Humans and
Machines [46.19008633309041]
We present a visual and physical prediction benchmark that precisely measures this capability.
We compare an array of algorithms on their ability to make diverse physical predictions.
We find that graph neural networks with access to the physical state best capture human behavior.
arXiv Detail & Related papers (2021-06-15T16:13:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.