Audio-Driven Reinforcement Learning for Head-Orientation in Naturalistic Environments
- URL: http://arxiv.org/abs/2409.10048v1
- Date: Mon, 16 Sep 2024 07:20:33 GMT
- Title: Audio-Driven Reinforcement Learning for Head-Orientation in Naturalistic Environments
- Authors: Wessel Ledder, Yuzhen Qin, Kiki van der Heijden,
- Abstract summary: We propose an audio-driven DRL framework to develop an autonomous agent that orients towards a talker in the acoustic environment.
Our results show that the agent learned to perform the task at a near perfect level when trained on speech segments in anechoic environments.
- Score: 0.7373617024876725
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Although deep reinforcement learning (DRL) approaches in audio signal processing have seen substantial progress in recent years, audio-driven DRL for tasks such as navigation, gaze control and head-orientation control in the context of human-robot interaction have received little attention. Here, we propose an audio-driven DRL framework in which we utilise deep Q-learning to develop an autonomous agent that orients towards a talker in the acoustic environment based on stereo speech recordings. Our results show that the agent learned to perform the task at a near perfect level when trained on speech segments in anechoic environments (that is, without reverberation). The presence of reverberation in naturalistic acoustic environments affected the agent's performance, although the agent still substantially outperformed a baseline, randomly acting agent. Finally, we quantified the degree of generalization of the proposed DRL approach across naturalistic acoustic environments. Our experiments revealed that policies learned by agents trained on medium or high reverb environments generalized to low reverb environments, but policies learned by agents trained on anechoic or low reverb environments did not generalize to medium or high reverb environments. Taken together, this study demonstrates the potential of audio-driven DRL for tasks such as head-orientation control and highlights the need for training strategies that enable robust generalization across environments for real-world audio-driven DRL applications.
Related papers
- Audio Enhancement for Computer Audition -- An Iterative Training Paradigm Using Sample Importance [42.90024643696503]
We present an end-to-end learning solution to jointly optimise the models for audio enhancement.
We consider four representative applications to evaluate our training paradigm.
arXiv Detail & Related papers (2024-08-12T16:23:58Z) - Enhancing Noise Robustness of Retrieval-Augmented Language Models with Adaptive Adversarial Training [39.21885486667879]
Large Language Models (LLMs) exhibit substantial capabilities yet encounter challenges, including hallucination, outdated knowledge, and untraceable reasoning processes.
Retrieval-augmented generation (RAG) has emerged as a promising solution, integrating knowledge from external databases to mitigate these challenges.
We propose a novel RAG approach known as Retrieval-augmented Adaptive Adrial Training (RAAT)
arXiv Detail & Related papers (2024-05-31T16:24:53Z) - ActiveRIR: Active Audio-Visual Exploration for Acoustic Environment Modeling [57.1025908604556]
An environment acoustic model represents how sound is transformed by the physical characteristics of an indoor environment.
We propose active acoustic sampling, a new task for efficiently building an environment acoustic model of an unmapped environment.
We introduce ActiveRIR, a reinforcement learning policy that leverages information from audio-visual sensor streams to guide agent navigation and determine optimal acoustic data sampling positions.
arXiv Detail & Related papers (2024-04-24T21:30:01Z) - Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment.
We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent.
We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z) - Direction-Aware Joint Adaptation of Neural Speech Enhancement and
Recognition in Real Multiparty Conversational Environments [21.493664174262737]
This paper describes noisy speech recognition for an augmented reality headset that helps verbal communication within real multiparty conversational environments.
We propose a semi-supervised adaptation method that jointly updates the mask estimator and the ASR model at run-time using clean speech signals with ground-truth transcriptions and noisy speech signals with highly-confident estimated transcriptions.
arXiv Detail & Related papers (2022-07-15T03:43:35Z) - Few-Shot Audio-Visual Learning of Environment Acoustics [89.16560042178523]
Room impulse response (RIR) functions capture how the surrounding physical environment transforms the sounds heard by a listener.
We explore how to infer RIRs based on a sparse set of images and echoes observed in the space.
In experiments using a state-of-the-art audio-visual simulator for 3D environments, we demonstrate that our method successfully generates arbitrary RIRs.
arXiv Detail & Related papers (2022-06-08T16:38:24Z) - A Deep Reinforcement Learning Approach for Audio-based Navigation and
Audio Source Localization in Multi-speaker Environments [1.0527821704930371]
In this work we apply deep reinforcement learning to the problems of navigating a three-dimensional environment and inferring the locations of human speaker audio sources within.
We create two virtual environments using the Unity game engine, one presenting an audio-based navigation problem and one presenting an audio source localization problem.
We also create an autonomous agent based on PPO online reinforcement learning algorithm and attempt to train it to solve these environments.
arXiv Detail & Related papers (2021-10-25T10:18:34Z) - Dynamics Generalization via Information Bottleneck in Deep Reinforcement
Learning [90.93035276307239]
We propose an information theoretic regularization objective and an annealing-based optimization method to achieve better generalization ability in RL agents.
We demonstrate the extreme generalization benefits of our approach in different domains ranging from maze navigation to robotic tasks.
This work provides a principled way to improve generalization in RL by gradually removing information that is redundant for task-solving.
arXiv Detail & Related papers (2020-08-03T02:24:20Z) - Environment Shaping in Reinforcement Learning using State Abstraction [63.444831173608605]
We propose a novel framework of emphenvironment shaping using state abstraction.
Our key idea is to compress the environment's large state space with noisy signals to an abstracted space.
We show that the agent's policy learnt in the shaped environment preserves near-optimal behavior in the original environment.
arXiv Detail & Related papers (2020-06-23T17:00:22Z) - Robust Reinforcement Learning via Adversarial training with Langevin
Dynamics [51.234482917047835]
We introduce a sampling perspective to tackle the challenging task of training robust Reinforcement Learning (RL) agents.
We present a novel, scalable two-player RL algorithm, which is a sampling variant of the two-player policy method.
arXiv Detail & Related papers (2020-02-14T14:59:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.