Scene-Agnostic Multi-Microphone Speech Dereverberation
- URL: http://arxiv.org/abs/2010.11875v2
- Date: Thu, 10 Jun 2021 18:17:26 GMT
- Title: Scene-Agnostic Multi-Microphone Speech Dereverberation
- Authors: Yochai Yemini, Ethan Fetaya, Haggai Maron and Sharon Gannot
- Abstract summary: We present an NN architecture that can cope with microphone arrays whose number and positions are unknown.
Our approach harnesses recent advances in deep learning on set-structured data to design an architecture that enhances the reverberant log-spectrum.
- Score: 47.735158037490834
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural networks (NNs) have been widely applied in speech processing tasks,
and, in particular, those employing microphone arrays. Nevertheless, most
existing NN architectures can only deal with fixed and position-specific
microphone arrays. In this paper, we present an NN architecture that can cope
with microphone arrays whose number and positions of the microphones are
unknown, and demonstrate its applicability in the speech dereverberation task.
To this end, our approach harnesses recent advances in deep learning on
set-structured data to design an architecture that enhances the reverberant
log-spectrum. We use noisy and noiseless versions of a simulated reverberant
dataset to test the proposed architecture. Our experiments on the noisy data
show that the proposed scene-agnostic setup outperforms a powerful scene-aware
framework, sometimes even with fewer microphones. With the noiseless dataset we
show that, in most cases, our method outperforms the position-aware network as
well as the state-of-the-art weighted linear prediction error (WPE) algorithm.
Related papers
- A unified multichannel far-field speech recognition system: combining
neural beamforming with attention based end-to-end model [14.795953417531907]
We propose a unified multichannel far-field speech recognition system that combines the neural beamforming and transformer-based Listen, Spell, Attend (LAS) speech recognition system.
The proposed method achieve 19.26% improvement when compared with a strong baseline.
arXiv Detail & Related papers (2024-01-05T07:11:13Z) - A Study of Designing Compact Audio-Visual Wake Word Spotting System
Based on Iterative Fine-Tuning in Neural Network Pruning [57.28467469709369]
We investigate on designing a compact audio-visual wake word spotting (WWS) system by utilizing visual information.
We introduce a neural network pruning strategy via the lottery ticket hypothesis in an iterative fine-tuning manner (LTH-IF)
The proposed audio-visual system achieves significant performance improvements over the single-modality (audio-only or video-only) system under different noisy conditions.
arXiv Detail & Related papers (2022-02-17T08:26:25Z) - Voice Activity Detection for Transient Noisy Environment Based on
Diffusion Nets [13.558688470594674]
We address voice activity detection in acoustic environments of transients and stationary noises.
We exploit unique spatial patterns of speech and non-speech audio frames by independently learning their underlying geometric structure.
A deep neural network is trained to separate speech from non-speech frames.
arXiv Detail & Related papers (2021-06-25T17:05:26Z) - Data Fusion for Audiovisual Speaker Localization: Extending Dynamic
Stream Weights to the Spatial Domain [103.3388198420822]
Esting the positions of multiple speakers can be helpful for tasks like automatic speech recognition or speaker diarization.
This paper proposes a novel audiovisual data fusion framework for speaker localization by assigning individual dynamic stream weights to specific regions.
A performance evaluation using audiovisual recordings yields promising results, with the proposed fusion approach outperforming all baseline models.
arXiv Detail & Related papers (2021-02-23T09:59:31Z) - Neural Network-based Virtual Microphone Estimator [111.79608275698274]
We propose a neural network-based virtual microphone estimator (NN-VME)
The NN-VME estimates virtual microphone signals directly in the time domain, by utilizing the precise estimation capability of the recent time-domain neural networks.
Experiments on the CHiME-4 corpus show that the proposed NN-VME achieves high virtual microphone estimation performance even for real recordings.
arXiv Detail & Related papers (2021-01-12T06:30:24Z) - Data-Efficient Framework for Real-world Multiple Sound Source 2D
Localization [7.564344795030588]
We propose a novel ensemble-discrimination method to improve the localization performance without requiring any label from the real data.
It enables the model to be trained with data from specific microphone array layouts while generalizing well to unseen layouts during inference.
arXiv Detail & Related papers (2020-12-10T09:22:52Z) - Multi-Tones' Phase Coding (MTPC) of Interaural Time Difference by
Spiking Neural Network [68.43026108936029]
We propose a pure spiking neural network (SNN) based computational model for precise sound localization in the noisy real-world environment.
We implement this algorithm in a real-time robotic system with a microphone array.
The experiment results show a mean error azimuth of 13 degrees, which surpasses the accuracy of the other biologically plausible neuromorphic approach for sound source localization.
arXiv Detail & Related papers (2020-07-07T08:22:56Z) - Deep Speaker Embeddings for Far-Field Speaker Recognition on Short
Utterances [53.063441357826484]
Speaker recognition systems based on deep speaker embeddings have achieved significant performance in controlled conditions.
Speaker verification on short utterances in uncontrolled noisy environment conditions is one of the most challenging and highly demanded tasks.
This paper presents approaches aimed to achieve two goals: a) improve the quality of far-field speaker verification systems in the presence of environmental noise, reverberation and b) reduce the system qualitydegradation for short utterances.
arXiv Detail & Related papers (2020-02-14T13:34:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.