Related papers: Dereverberation using joint estimation of dry speech signal and acoustic system

Dereverberation using joint estimation of dry speech signal and acoustic system

URL: http://arxiv.org/abs/2007.12581v1
Date: Fri, 24 Jul 2020 15:33:08 GMT
Title: Dereverberation using joint estimation of dry speech signal and acoustic system
Authors: Sanna Wager, Keunwoo Choi, Simon Durand
Abstract summary: Speech dereverberation aims to remove quality-degrading effects of a time-invariant impulse response filter from the signal. In this report, we describe an approach to speech dereverberation that involves joint estimation of the dry speech signal and of the room impulse response.
Score: 3.5131188669634885
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The purpose of speech dereverberation is to remove quality-degrading effects of a time-invariant impulse response filter from the signal. In this report, we describe an approach to speech dereverberation that involves joint estimation of the dry speech signal and of the room impulse response. We explore deep learning models that apply to each task separately, and how these can be combined in a joint model with shared parameters.

Related papers

A Hybrid Model for Weakly-Supervised Speech Dereverberation [2.731944614640173]
This paper introduces a new training strategy to improve speech dereverberation systems using minimal acoustic information and reverberant (wet) speech. Experimental results demonstrate that our method achieves more consistent performance across various objective metrics used in speech dereverberation than the state-of-the-art.
arXiv Detail & Related papers (2025-02-06T09:21:22Z)
Unsupervised Blind Joint Dereverberation and Room Acoustics Estimation with Diffusion Models [21.669363620480333]
We present an unsupervised method for blind dereverberation and room impulse response estimation, called BUDDy. In a blind scenario where the room impulse response is unknown, BUDDy successfully performs speech dereverberation. Unlike supervised methods, which often struggle to generalize, BUDDy seamlessly adapts to different acoustic conditions.
arXiv Detail & Related papers (2024-08-14T11:31:32Z)
Diffusion Posterior Sampling for Informed Single-Channel Dereverberation [15.16865739526702]
We present an informed single-channel dereverberation method based on conditional generation with diffusion models. With knowledge of the room impulse response, the anechoic utterance is generated via reverse diffusion. The proposed approach is largely more robust to measurement noise compared to a state-of-the-art informed single-channel dereverberation method.
arXiv Detail & Related papers (2023-06-21T14:14:05Z)
Repeat after me: Self-supervised learning of acoustic-to-articulatory mapping by vocal imitation [9.416401293559112]
We propose a computational model of speech production combining a pre-trained neural articulatory synthesizer able to reproduce complex speech stimuli from a limited set of interpretable articulatory parameters. Both forward and inverse models are jointly trained in a self-supervised way from raw acoustic-only speech data from different speakers. The imitation simulations are evaluated objectively and subjectively and display quite encouraging performances.
arXiv Detail & Related papers (2022-04-05T15:02:49Z)
Deep Impulse Responses: Estimating and Parameterizing Filters with Deep Networks [76.830358429947]
Impulse response estimation in high noise and in-the-wild settings is a challenging problem. We propose a novel framework for parameterizing and estimating impulse responses based on recent advances in neural representation learning.
arXiv Detail & Related papers (2022-02-07T18:57:23Z)
Discretization and Re-synthesis: an alternative method to solve the Cocktail Party Problem [65.25725367771075]
This study demonstrates, for the first time, that the synthesis-based approach can also perform well on this problem. Specifically, we propose a novel speech separation/enhancement model based on the recognition of discrete symbols. By utilizing the synthesis model with the input of discrete symbols, after the prediction of discrete symbol sequence, each target speech could be re-synthesized.
arXiv Detail & Related papers (2021-12-17T08:35:40Z)
Unsupervised Sound Localization via Iterative Contrastive Learning [106.56167882750792]
We propose an iterative contrastive learning framework that requires no data annotations. We then use the pseudo-labels to learn the correlation between the visual and audio signals sampled from the same video. Our iterative strategy gradually encourages the localization of the sounding objects and reduces the correlation between the non-sounding regions and the reference audio.
arXiv Detail & Related papers (2021-04-01T07:48:29Z)
Informed Source Extraction With Application to Acoustic Echo Reduction [8.296684637620553]
deep learning methods leverage a speaker discriminative model that maps a reference snippet uttered by the target speaker into a single embedding vector. We propose a time-varying source discriminative model that captures the temporal dynamics of the reference signal. Experimental results demonstrate that the proposed method significantly improves the extraction performance when applied in an acoustic echo reduction scenario.
arXiv Detail & Related papers (2020-11-09T17:13:23Z)
Continuous Speech Separation with Conformer [60.938212082732775]
We use transformer and conformer in lieu of recurrent neural networks in the separation system. We believe capturing global information with the self-attention based method is crucial for the speech separation.
arXiv Detail & Related papers (2020-08-13T09:36:05Z)
Simultaneous Denoising and Dereverberation Using Deep Embedding Features [64.58693911070228]
We propose a joint training method for simultaneous speech denoising and dereverberation using deep embedding features. At the denoising stage, the DC network is leveraged to extract noise-free deep embedding features. At the dereverberation stage, instead of using the unsupervised K-means clustering algorithm, another neural network is utilized to estimate the anechoic speech.
arXiv Detail & Related papers (2020-04-06T06:34:01Z)
Temporal-Spatial Neural Filter: Direction Informed End-to-End Multi-channel Target Speech Separation [66.46123655365113]
Target speech separation refers to extracting the target speaker's speech from mixed signals. Two main challenges are the complex acoustic environment and the real-time processing requirement. We propose a temporal-spatial neural filter, which directly estimates the target speech waveform from multi-speaker mixture.
arXiv Detail & Related papers (2020-01-02T11:12:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.