Aerial Vision-and-Dialog Navigation
- URL: http://arxiv.org/abs/2205.12219v3
- Date: Thu, 1 Jun 2023 06:39:11 GMT
- Title: Aerial Vision-and-Dialog Navigation
- Authors: Yue Fan, Winson Chen, Tongzhou Jiang, Chun Zhou, Yi Zhang, Xin Eric
Wang
- Abstract summary: We introduce Aerial Vision-and-Dialog Navigation (AVDN), to navigate a drone via natural language conversation.
We build a drone simulator with a continuous environment and collect a new AVDN dataset of over 3k recorded navigation trajectories.
We propose an effective Human Attention Aided Transformer model (HAA-Transformer) which learns to predict both navigation waypoints and human attention.
- Score: 10.596163697911525
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The ability to converse with humans and follow natural language commands is
crucial for intelligent unmanned aerial vehicles (a.k.a. drones). It can
relieve people's burden of holding a controller all the time, allow
multitasking, and make drone control more accessible for people with
disabilities or with their hands occupied. To this end, we introduce Aerial
Vision-and-Dialog Navigation (AVDN), to navigate a drone via natural language
conversation. We build a drone simulator with a continuous photorealistic
environment and collect a new AVDN dataset of over 3k recorded navigation
trajectories with asynchronous human-human dialogs between commanders and
followers. The commander provides initial navigation instruction and further
guidance by request, while the follower navigates the drone in the simulator
and asks questions when needed. During data collection, followers' attention on
the drone's visual observation is also recorded. Based on the AVDN dataset, we
study the tasks of aerial navigation from (full) dialog history and propose an
effective Human Attention Aided Transformer model (HAA-Transformer), which
learns to predict both navigation waypoints and human attention.
Related papers
- CoNav: A Benchmark for Human-Centered Collaborative Navigation [66.6268966718022]
We propose a collaborative navigation (CoNav) benchmark.
Our CoNav tackles the critical challenge of constructing a 3D navigation environment with realistic and diverse human activities.
We propose an intention-aware agent for reasoning both long-term and short-term human intention.
arXiv Detail & Related papers (2024-06-04T15:44:25Z) - Multi-model fusion for Aerial Vision and Dialog Navigation based on
human attention aids [69.98258892165767]
We present an aerial navigation task for the 2023 ICCV Conversation History.
We propose an effective method of fusion training of Human Attention Aided Transformer model (HAA-Transformer) and Human Attention Aided LSTM (HAA-LSTM) models.
arXiv Detail & Related papers (2023-08-27T10:32:52Z) - TransVisDrone: Spatio-Temporal Transformer for Vision-based
Drone-to-Drone Detection in Aerial Videos [57.92385818430939]
Drone-to-drone detection using visual feed has crucial applications, such as detecting drone collisions, detecting drone attacks, or coordinating flight with other drones.
Existing methods are computationally costly, follow non-end-to-end optimization, and have complex multi-stage pipelines, making them less suitable for real-time deployment on edge devices.
We propose a simple yet effective framework, itTransVisDrone, that provides an end-to-end solution with higher computational efficiency.
arXiv Detail & Related papers (2022-10-16T03:05:13Z) - LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language,
Vision, and Action [76.71101507291473]
We present a system, LM-Nav, for robotic navigation that enjoys the benefits of training on unannotated large datasets of trajectories.
We show that such a system can be constructed entirely out of pre-trained models for navigation (ViNG), image-language association (CLIP), and language modeling (GPT-3), without requiring any fine-tuning or language-annotated robot data.
arXiv Detail & Related papers (2022-07-10T10:41:50Z) - Vision-based Drone Flocking in Outdoor Environments [9.184987303791292]
This letter proposes a vision-based detection and tracking algorithm for drone swarms.
We employ a convolutional neural network to detect and localize nearby agents onboard the quadcopters in real-time.
We show that the drones can safely navigate in an outdoor environment despite substantial background clutter and difficult lighting conditions.
arXiv Detail & Related papers (2020-12-02T14:44:40Z) - Relative Drone-Ground Vehicle Localization using LiDAR and Fisheye
Cameras through Direct and Indirect Observations [0.0]
We present a LiDAR-camera-based relative pose estimation method between a drone and a ground vehicle.
We propose a dynamically adaptive kernel-based method for drone detection and tracking using the LiDAR.
In our experiments, we were able to achieve very fast initial detection and real-time tracking of the drone.
arXiv Detail & Related papers (2020-11-13T16:41:55Z) - Learn by Observation: Imitation Learning for Drone Patrolling from
Videos of A Human Navigator [22.06785798356346]
We propose to let the drone learn patrolling in the air by observing and imitating how a human navigator does it on the ground.
The observation process enables the automatic collection and annotation of data using inter-frame geometric consistency.
A newly designed neural network is trained based on the annotated data to predict appropriate directions and translations.
arXiv Detail & Related papers (2020-08-30T15:20:40Z) - Visual Navigation Among Humans with Optimal Control as a Supervisor [72.5188978268463]
We propose an approach that combines learning-based perception with model-based optimal control to navigate among humans.
Our approach is enabled by our novel data-generation tool, HumANav.
We demonstrate that the learned navigation policies can anticipate and react to humans without explicitly predicting future human motion.
arXiv Detail & Related papers (2020-03-20T16:13:47Z) - Detection and Tracking Meet Drones Challenge [131.31749447313197]
This paper presents a review of object detection and tracking datasets and benchmarks, and discusses the challenges of collecting large-scale drone-based object detection and tracking datasets with manual annotations.
We describe our VisDrone dataset, which is captured over various urban/suburban areas of 14 different cities across China from North to South.
We provide a detailed analysis of the current state of the field of large-scale object detection and tracking on drones, and conclude the challenge as well as propose future directions.
arXiv Detail & Related papers (2020-01-16T00:11:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.