Unmanned Aerial Vehicle Control Through Domain-based Automatic Speech
Recognition
- URL: http://arxiv.org/abs/2009.04215v1
- Date: Wed, 9 Sep 2020 11:17:45 GMT
- Title: Unmanned Aerial Vehicle Control Through Domain-based Automatic Speech
Recognition
- Authors: Ruben Contreras, Angel Ayala, Francisco Cruz
- Abstract summary: We present a domain-based speech recognition architecture to control an unmanned aerial vehicle such as a drone.
The drone control is performed using a more natural, human-like way to communicate the instructions.
We implement an algorithm for command interpretation using both Spanish and English languages.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Currently, unmanned aerial vehicles, such as drones, are becoming a part of
our lives and reaching out to many areas of society, including the
industrialized world. A common alternative to control the movements and actions
of the drone is through unwired tactile interfaces, for which different remote
control devices can be found. However, control through such devices is not a
natural, human-like communication interface, which sometimes is difficult to
master for some users. In this work, we present a domain-based speech
recognition architecture to effectively control an unmanned aerial vehicle such
as a drone. The drone control is performed using a more natural, human-like way
to communicate the instructions. Moreover, we implement an algorithm for
command interpretation using both Spanish and English languages, as well as to
control the movements of the drone in a simulated domestic environment. The
conducted experiments involve participants giving voice commands to the drone
in both languages in order to compare the effectiveness of each of them,
considering the mother tongue of the participants in the experiment.
Additionally, different levels of distortion have been applied to the voice
commands in order to test the proposed approach when facing noisy input
signals. The obtained results show that the unmanned aerial vehicle is capable
of interpreting user voice instructions achieving an improvement in
speech-to-action recognition for both languages when using phoneme matching in
comparison to only using the cloud-based algorithm without domain-based
instructions. Using raw audio inputs, the cloud-based approach achieves 74.81%
and 97.04% accuracy for English and Spanish instructions respectively, whereas
using our phoneme matching approach the results are improved achieving 93.33%
and 100.00% accuracy for English and Spanish languages.
Related papers
- SIFToM: Robust Spoken Instruction Following through Theory of Mind [51.326266354164716]
We present a cognitively inspired model, Speech Instruction Following through Theory of Mind (SIFToM), to enable robots to pragmatically follow human instructions under diverse speech conditions.
Results show that the SIFToM model outperforms state-of-the-art speech and language models, approaching human-level accuracy on challenging speech instruction following tasks.
arXiv Detail & Related papers (2024-09-17T02:36:10Z) - Learning to Communicate Functional States with Nonverbal Expressions for Improved Human-Robot Collaboration [3.5408317027307055]
Collaborative robots must effectively communicate their internal state to humans to enable a smooth interaction.
We propose a reinforcement learning algorithm based on noisy human feedback to produce accurately interpreted nonverbal auditory expressions.
arXiv Detail & Related papers (2024-04-30T04:18:21Z) - ConvoFusion: Multi-Modal Conversational Diffusion for Co-Speech Gesture Synthesis [50.69464138626748]
We present ConvoFusion, a diffusion-based approach for multi-modal gesture synthesis.
Our method proposes two guidance objectives that allow the users to modulate the impact of different conditioning modalities.
Our method is versatile in that it can be trained either for generating monologue gestures or even the conversational gestures.
arXiv Detail & Related papers (2024-03-26T17:59:52Z) - Bootstrapping Adaptive Human-Machine Interfaces with Offline
Reinforcement Learning [82.91837418721182]
Adaptive interfaces can help users perform sequential decision-making tasks.
Recent advances in human-in-the-loop machine learning enable such systems to improve by interacting with users.
We propose a reinforcement learning algorithm to train an interface to map raw command signals to actions.
arXiv Detail & Related papers (2023-09-07T16:52:27Z) - Multi-model fusion for Aerial Vision and Dialog Navigation based on
human attention aids [69.98258892165767]
We present an aerial navigation task for the 2023 ICCV Conversation History.
We propose an effective method of fusion training of Human Attention Aided Transformer model (HAA-Transformer) and Human Attention Aided LSTM (HAA-LSTM) models.
arXiv Detail & Related papers (2023-08-27T10:32:52Z) - "No, to the Right" -- Online Language Corrections for Robotic
Manipulation via Shared Autonomy [70.45420918526926]
We present LILAC, a framework for incorporating and adapting to natural language corrections online during execution.
Instead of discrete turn-taking between a human and robot, LILAC splits agency between the human and robot.
We show that our corrections-aware approach obtains higher task completion rates, and is subjectively preferred by users.
arXiv Detail & Related papers (2023-01-06T15:03:27Z) - Learning Deep Sensorimotor Policies for Vision-based Autonomous Drone
Racing [52.50284630866713]
Existing systems often require hand-engineered components for state estimation, planning, and control.
This paper tackles the vision-based autonomous-drone-racing problem by learning deep sensorimotor policies.
arXiv Detail & Related papers (2022-10-26T19:03:17Z) - Robust Sensor Fusion Algorithms Against VoiceCommand Attacks in
Autonomous Vehicles [8.35945218644081]
We propose a novel multimodal deep learning classification system to defend against inaudible command attacks.
Our experimental results confirm the feasibility of the proposed defense methods and the best classification accuracy reaches 89.2%.
arXiv Detail & Related papers (2021-04-20T10:08:46Z) - Language-Conditioned Imitation Learning for Robot Manipulation Tasks [39.40937105264774]
We introduce a method for incorporating unstructured natural language into imitation learning.
At training time, the expert can provide demonstrations along with verbal descriptions in order to describe the underlying intent.
The training process then interrelates these two modalities to encode the correlations between language, perception, and motion.
The resulting language-conditioned visuomotor policies can be conditioned at runtime on new human commands and instructions.
arXiv Detail & Related papers (2020-10-22T21:49:08Z) - American Sign Language Identification Using Hand Trackpoint Analysis [0.0]
We propose a novel machine learning based pipeline for American Sign Language identification using hand track points.
We convert a hand gesture into a series of hand track point coordinates that serve as an input to our system.
Our system achieved an Accuracy of 95.66% to identify American sign language gestures.
arXiv Detail & Related papers (2020-10-20T19:59:16Z) - Learn by Observation: Imitation Learning for Drone Patrolling from
Videos of A Human Navigator [22.06785798356346]
We propose to let the drone learn patrolling in the air by observing and imitating how a human navigator does it on the ground.
The observation process enables the automatic collection and annotation of data using inter-frame geometric consistency.
A newly designed neural network is trained based on the annotated data to predict appropriate directions and translations.
arXiv Detail & Related papers (2020-08-30T15:20:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.