Multi-model fusion for Aerial Vision and Dialog Navigation based on
human attention aids
- URL: http://arxiv.org/abs/2308.14064v1
- Date: Sun, 27 Aug 2023 10:32:52 GMT
- Title: Multi-model fusion for Aerial Vision and Dialog Navigation based on
human attention aids
- Authors: Xinyi Wang, Xuan Cui, Danxu Li, Fang Liu, Licheng Jiao
- Abstract summary: We present an aerial navigation task for the 2023 ICCV Conversation History.
We propose an effective method of fusion training of Human Attention Aided Transformer model (HAA-Transformer) and Human Attention Aided LSTM (HAA-LSTM) models.
- Score: 69.98258892165767
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Drones have been widely used in many areas of our daily lives. It relieves
people of the burden of holding a controller all the time and makes drone
control easier to use for people with disabilities or occupied hands. However,
the control of aerial robots is more complicated compared to normal robots due
to factors such as uncontrollable height. Therefore, it is crucial to develop
an intelligent UAV that has the ability to talk to humans and follow natural
language commands. In this report, we present an aerial navigation task for the
2023 ICCV Conversation History. Based on the AVDN dataset containing more than
3k recorded navigation trajectories and asynchronous human-robot conversations,
we propose an effective method of fusion training of Human Attention Aided
Transformer model (HAA-Transformer) and Human Attention Aided LSTM (HAA-LSTM)
model, which achieves the prediction of the navigation routing points and human
attention. The method not only achieves high SR and SPL metrics, but also shows
a 7% improvement in GP metrics compared to the baseline model.
Related papers
- Learning Human Perception Dynamics for Informative Robot Communication [21.170542003568674]
CoNav-Maze is a simulated robotics environment where a robot navigates using local perception while a human operator provides guidance based on an inaccurate map.
To enable efficient human-robot cooperation, we propose Information Gain Monte Carlo Tree Search (IG-MCTS)
Central to IG-MCTS is a neural human perception dynamics model that estimates how humans distill information from robot communications.
arXiv Detail & Related papers (2025-02-03T22:08:04Z) - Proximal Control of UAVs with Federated Learning for Human-Robot Collaborative Domains [3.1043493260209805]
This work proposes an action recognition and control approach based on Long Short-Term Memory (LSTM) Deep Neural Networks with two layers in association with three densely connected layers and Federated Learning (FL) embedded in multiple drones.
Experiments with real robots achieved an accuracy greater than 96%.
arXiv Detail & Related papers (2024-12-03T21:57:04Z) - Human-Agent Joint Learning for Efficient Robot Manipulation Skill Acquisition [48.65867987106428]
We introduce a novel system for joint learning between human operators and robots.
It enables human operators to share control of a robot end-effector with a learned assistive agent.
It reduces the need for human adaptation while ensuring the collected data is of sufficient quality for downstream tasks.
arXiv Detail & Related papers (2024-06-29T03:37:29Z) - AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents [109.3804962220498]
AutoRT is a system to scale up the deployment of operational robots in completely unseen scenarios with minimal human supervision.
We demonstrate AutoRT proposing instructions to over 20 robots across multiple buildings and collecting 77k real robot episodes via both teleoperation and autonomous robot policies.
We experimentally show that such "in-the-wild" data collected by AutoRT is significantly more diverse, and that AutoRT's use of LLMs allows for instruction following data collection robots that can align to human preferences.
arXiv Detail & Related papers (2024-01-23T18:45:54Z) - AZTR: Aerial Video Action Recognition with Auto Zoom and Temporal
Reasoning [63.628195002143734]
We propose a novel approach for aerial video action recognition.
Our method is designed for videos captured using UAVs and can run on edge or mobile devices.
We present a learning-based approach that uses customized auto zoom to automatically identify the human target and scale it appropriately.
arXiv Detail & Related papers (2023-03-02T21:24:19Z) - Aerial Vision-and-Dialog Navigation [10.596163697911525]
We introduce Aerial Vision-and-Dialog Navigation (AVDN), to navigate a drone via natural language conversation.
We build a drone simulator with a continuous environment and collect a new AVDN dataset of over 3k recorded navigation trajectories.
We propose an effective Human Attention Aided Transformer model (HAA-Transformer) which learns to predict both navigation waypoints and human attention.
arXiv Detail & Related papers (2022-05-24T17:28:14Z) - Model Predictive Control for Fluid Human-to-Robot Handovers [50.72520769938633]
Planning motions that take human comfort into account is not a part of the human-robot handover process.
We propose to generate smooth motions via an efficient model-predictive control framework.
We conduct human-to-robot handover experiments on a diverse set of objects with several users.
arXiv Detail & Related papers (2022-03-31T23:08:20Z) - Socially Compliant Navigation Dataset (SCAND): A Large-Scale Dataset of
Demonstrations for Social Navigation [92.66286342108934]
Social navigation is the capability of an autonomous agent, such as a robot, to navigate in a'socially compliant' manner in the presence of other intelligent agents such as humans.
Our dataset contains 8.7 hours, 138 trajectories, 25 miles of socially compliant, human teleoperated driving demonstrations.
arXiv Detail & Related papers (2022-03-28T19:09:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.