Related papers: Multi-model fusion for Aerial Vision and Dialog Navigation based on human attention aids

Multi-model fusion for Aerial Vision and Dialog Navigation based on human attention aids

URL: http://arxiv.org/abs/2308.14064v1
Date: Sun, 27 Aug 2023 10:32:52 GMT
Title: Multi-model fusion for Aerial Vision and Dialog Navigation based on human attention aids
Authors: Xinyi Wang, Xuan Cui, Danxu Li, Fang Liu, Licheng Jiao
Abstract summary: We present an aerial navigation task for the 2023 ICCV Conversation History. We propose an effective method of fusion training of Human Attention Aided Transformer model (HAA-Transformer) and Human Attention Aided LSTM (HAA-LSTM) models.
Score: 69.98258892165767
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Drones have been widely used in many areas of our daily lives. It relieves people of the burden of holding a controller all the time and makes drone control easier to use for people with disabilities or occupied hands. However, the control of aerial robots is more complicated compared to normal robots due to factors such as uncontrollable height. Therefore, it is crucial to develop an intelligent UAV that has the ability to talk to humans and follow natural language commands. In this report, we present an aerial navigation task for the 2023 ICCV Conversation History. Based on the AVDN dataset containing more than 3k recorded navigation trajectories and asynchronous human-robot conversations, we propose an effective method of fusion training of Human Attention Aided Transformer model (HAA-Transformer) and Human Attention Aided LSTM (HAA-LSTM) model, which achieves the prediction of the navigation routing points and human attention. The method not only achieves high SR and SPL metrics, but also shows a 7% improvement in GP metrics compared to the baseline model.

Related papers

HACTS: a Human-As-Copilot Teleoperation System for Robot Learning [47.9126187195398]
We introduce HACTS (Human-As-Copilot Teleoperation System), a novel system that establishes bilateral, real-time joint synchronization between a robot arm and teleoperation hardware. This simple yet effective feedback mechanism, akin to a steering wheel in autonomous vehicles, enables the human copilot to intervene seamlessly while collecting action-correction data for future learning.
arXiv Detail & Related papers (2025-03-31T13:28:13Z)
Learning Human Perception Dynamics for Informative Robot Communication [21.170542003568674]
CoNav-Maze is a simulated robotics environment where a robot navigates using local perception while a human operator provides guidance based on an inaccurate map. To enable efficient human-robot cooperation, we propose Information Gain Monte Carlo Tree Search (IG-MCTS) Central to IG-MCTS is a neural human perception dynamics model that estimates how humans distill information from robot communications.
arXiv Detail & Related papers (2025-02-03T22:08:04Z)
Proximal Control of UAVs with Federated Learning for Human-Robot Collaborative Domains [3.1043493260209805]
This work proposes an action recognition and control approach based on Long Short-Term Memory (LSTM) Deep Neural Networks with two layers in association with three densely connected layers and Federated Learning (FL) embedded in multiple drones. Experiments with real robots achieved an accuracy greater than 96%.
arXiv Detail & Related papers (2024-12-03T21:57:04Z)
Neuromorphic Attitude Estimation and Control [17.895261339368815]
This research presents the first neuromorphic control system using a spiking neural network (SNN) We apply this method to low-level attitude estimation and control for a quadrotor, deploying the SNN on a tiny Crazyflie. Our work shows the feasibility of performing neuromorphic end-to-end control, laying the basis for highly energy-efficient and low-latency neuromorphic autopilots.
arXiv Detail & Related papers (2024-11-21T08:54:45Z)
Combating Spatial Disorientation in a Dynamic Self-Stabilization Task Using AI Assistants [5.42300240053097]
Spatial disorientation is a leading cause of fatal aircraft accidents. This paper explores the potential of AI agents to aid pilots in maintaining balance and preventing unrecoverable losses of control.
arXiv Detail & Related papers (2024-09-09T21:06:22Z)
Human-Agent Joint Learning for Efficient Robot Manipulation Skill Acquisition [48.65867987106428]
We introduce a novel system for joint learning between human operators and robots. It enables human operators to share control of a robot end-effector with a learned assistive agent. It reduces the need for human adaptation while ensuring the collected data is of sufficient quality for downstream tasks.
arXiv Detail & Related papers (2024-06-29T03:37:29Z)
AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents [109.3804962220498]
AutoRT is a system to scale up the deployment of operational robots in completely unseen scenarios with minimal human supervision. We demonstrate AutoRT proposing instructions to over 20 robots across multiple buildings and collecting 77k real robot episodes via both teleoperation and autonomous robot policies. We experimentally show that such "in-the-wild" data collected by AutoRT is significantly more diverse, and that AutoRT's use of LLMs allows for instruction following data collection robots that can align to human preferences.
arXiv Detail & Related papers (2024-01-23T18:45:54Z)
Target-Grounded Graph-Aware Transformer for Aerial Vision-and-Dialog Navigation [10.25089706534778]
This report details the methods of the winning entry of the AVDN Challenge in ICCV CLVL 2023. It addresses the Aerial Navigation from Dialog History (andH) task, which requires a drone agent to associate dialog history with aerial observations to reach the destination. For better cross-modal grounding abilities of the drone agent, we propose a Target-Grounded Graph-Aware Transformer (TG-GAT) framework.
arXiv Detail & Related papers (2023-08-22T16:45:35Z)
AZTR: Aerial Video Action Recognition with Auto Zoom and Temporal Reasoning [63.628195002143734]
We propose a novel approach for aerial video action recognition. Our method is designed for videos captured using UAVs and can run on edge or mobile devices. We present a learning-based approach that uses customized auto zoom to automatically identify the human target and scale it appropriately.
arXiv Detail & Related papers (2023-03-02T21:24:19Z)
Aerial Vision-and-Dialog Navigation [10.596163697911525]
We introduce Aerial Vision-and-Dialog Navigation (AVDN), to navigate a drone via natural language conversation. We build a drone simulator with a continuous environment and collect a new AVDN dataset of over 3k recorded navigation trajectories. We propose an effective Human Attention Aided Transformer model (HAA-Transformer) which learns to predict both navigation waypoints and human attention.
arXiv Detail & Related papers (2022-05-24T17:28:14Z)
Model Predictive Control for Fluid Human-to-Robot Handovers [50.72520769938633]
Planning motions that take human comfort into account is not a part of the human-robot handover process. We propose to generate smooth motions via an efficient model-predictive control framework. We conduct human-to-robot handover experiments on a diverse set of objects with several users.
arXiv Detail & Related papers (2022-03-31T23:08:20Z)
Socially Compliant Navigation Dataset (SCAND): A Large-Scale Dataset of Demonstrations for Social Navigation [92.66286342108934]
Social navigation is the capability of an autonomous agent, such as a robot, to navigate in a'socially compliant' manner in the presence of other intelligent agents such as humans. Our dataset contains 8.7 hours, 138 trajectories, 25 miles of socially compliant, human teleoperated driving demonstrations.
arXiv Detail & Related papers (2022-03-28T19:09:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.