Multi-model fusion for Aerial Vision and Dialog Navigation based on
human attention aids
- URL: http://arxiv.org/abs/2308.14064v1
- Date: Sun, 27 Aug 2023 10:32:52 GMT
- Title: Multi-model fusion for Aerial Vision and Dialog Navigation based on
human attention aids
- Authors: Xinyi Wang, Xuan Cui, Danxu Li, Fang Liu, Licheng Jiao
- Abstract summary: We present an aerial navigation task for the 2023 ICCV Conversation History.
We propose an effective method of fusion training of Human Attention Aided Transformer model (HAA-Transformer) and Human Attention Aided LSTM (HAA-LSTM) models.
- Score: 69.98258892165767
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Drones have been widely used in many areas of our daily lives. It relieves
people of the burden of holding a controller all the time and makes drone
control easier to use for people with disabilities or occupied hands. However,
the control of aerial robots is more complicated compared to normal robots due
to factors such as uncontrollable height. Therefore, it is crucial to develop
an intelligent UAV that has the ability to talk to humans and follow natural
language commands. In this report, we present an aerial navigation task for the
2023 ICCV Conversation History. Based on the AVDN dataset containing more than
3k recorded navigation trajectories and asynchronous human-robot conversations,
we propose an effective method of fusion training of Human Attention Aided
Transformer model (HAA-Transformer) and Human Attention Aided LSTM (HAA-LSTM)
model, which achieves the prediction of the navigation routing points and human
attention. The method not only achieves high SR and SPL metrics, but also shows
a 7% improvement in GP metrics compared to the baseline model.
Related papers
- Combating Spatial Disorientation in a Dynamic Self-Stabilization Task Using AI Assistants [5.42300240053097]
Spatial disorientation is a leading cause of fatal aircraft accidents.
This paper explores the potential of AI agents to aid pilots in maintaining balance and preventing unrecoverable losses of control.
arXiv Detail & Related papers (2024-09-09T21:06:22Z) - Human-Agent Joint Learning for Efficient Robot Manipulation Skill Acquisition [48.65867987106428]
We introduce a novel system for joint learning between human operators and robots.
It enables human operators to share control of a robot end-effector with a learned assistive agent.
It reduces the need for human adaptation while ensuring the collected data is of sufficient quality for downstream tasks.
arXiv Detail & Related papers (2024-06-29T03:37:29Z) - AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents [109.3804962220498]
AutoRT is a system to scale up the deployment of operational robots in completely unseen scenarios with minimal human supervision.
We demonstrate AutoRT proposing instructions to over 20 robots across multiple buildings and collecting 77k real robot episodes via both teleoperation and autonomous robot policies.
We experimentally show that such "in-the-wild" data collected by AutoRT is significantly more diverse, and that AutoRT's use of LLMs allows for instruction following data collection robots that can align to human preferences.
arXiv Detail & Related papers (2024-01-23T18:45:54Z) - Target-Grounded Graph-Aware Transformer for Aerial Vision-and-Dialog
Navigation [10.25089706534778]
This report details the methods of the winning entry of the AVDN Challenge in ICCV CLVL 2023.
It addresses the Aerial Navigation from Dialog History (andH) task, which requires a drone agent to associate dialog history with aerial observations to reach the destination.
For better cross-modal grounding abilities of the drone agent, we propose a Target-Grounded Graph-Aware Transformer (TG-GAT) framework.
arXiv Detail & Related papers (2023-08-22T16:45:35Z) - AZTR: Aerial Video Action Recognition with Auto Zoom and Temporal
Reasoning [63.628195002143734]
We propose a novel approach for aerial video action recognition.
Our method is designed for videos captured using UAVs and can run on edge or mobile devices.
We present a learning-based approach that uses customized auto zoom to automatically identify the human target and scale it appropriately.
arXiv Detail & Related papers (2023-03-02T21:24:19Z) - Aerial Vision-and-Dialog Navigation [10.596163697911525]
We introduce Aerial Vision-and-Dialog Navigation (AVDN), to navigate a drone via natural language conversation.
We build a drone simulator with a continuous environment and collect a new AVDN dataset of over 3k recorded navigation trajectories.
We propose an effective Human Attention Aided Transformer model (HAA-Transformer) which learns to predict both navigation waypoints and human attention.
arXiv Detail & Related papers (2022-05-24T17:28:14Z) - Model Predictive Control for Fluid Human-to-Robot Handovers [50.72520769938633]
Planning motions that take human comfort into account is not a part of the human-robot handover process.
We propose to generate smooth motions via an efficient model-predictive control framework.
We conduct human-to-robot handover experiments on a diverse set of objects with several users.
arXiv Detail & Related papers (2022-03-31T23:08:20Z) - Socially Compliant Navigation Dataset (SCAND): A Large-Scale Dataset of
Demonstrations for Social Navigation [92.66286342108934]
Social navigation is the capability of an autonomous agent, such as a robot, to navigate in a'socially compliant' manner in the presence of other intelligent agents such as humans.
Our dataset contains 8.7 hours, 138 trajectories, 25 miles of socially compliant, human teleoperated driving demonstrations.
arXiv Detail & Related papers (2022-03-28T19:09:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.