Spatial Processing Front-End For Distant ASR Exploiting Self-Attention
Channel Combinator
- URL: http://arxiv.org/abs/2203.13919v1
- Date: Fri, 25 Mar 2022 21:43:15 GMT
- Title: Spatial Processing Front-End For Distant ASR Exploiting Self-Attention
Channel Combinator
- Authors: Dushyant Sharma and Rong Gong and James Fosburgh and Stanislav Yu.
Kruchinin and Patrick A. Naylor and Ljubomir Milanovic
- Abstract summary: We present a novel multi-channel front-end based on channel shortening with theWeighted Prediction Error (WPE) method.
We show that the proposed system used as part of a ContextNet based end-to-end (E2E) ASR system outperforms leading ASR systems.
- Score: 11.248169478873344
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a novel multi-channel front-end based on channel shortening with
theWeighted Prediction Error (WPE) method followed by a fixed MVDR beamformer
used in combination with a recently proposed self-attention-based channel
combination (SACC) scheme, for tackling the distant ASR problem. We show that
the proposed system used as part of a ContextNet based end-to-end (E2E) ASR
system outperforms leading ASR systems as demonstrated by a 21.6% reduction in
relative WER on a multi-channel LibriSpeech playback dataset. We also show how
dereverberation prior to beamforming is beneficial and compare the WPE method
with a modified neural channel shortening approach. An analysis of the
non-intrusive estimate of the signal C50 confirms that the 8 channel WPE method
provides significant dereverberation of the signals (13.6 dB improvement). We
also show how the weights of the SACC system allow the extraction of accurate
spatial information which can be beneficial for other speech processing
applications like diarization.
Related papers
- Joint Sparsity Pattern Learning Based Channel Estimation for Massive
MIMO-OTFS Systems [46.42375183269616]
We propose a channel estimation scheme based on joint sparsity pattern learning (JSPL) for massive multi-input multi-output (MIMO) modulation aided systems.
Both our simulation results and analysis demonstrate that the proposed channel estimation scheme achieves an improved performance over the representative state-of-the-art baseline schemes.
arXiv Detail & Related papers (2024-03-06T15:05:39Z) - Extreme Learning Machine-based Channel Estimation in IRS-Assisted Multi-User ISAC System [32.74137740936128]
This paper proposes a practical channel estimation approach for the first time to an IRS-assisted multiuser ISAC system.
A two-stage approach is proposed to transfer the overall estimation problem into sub-ones.
Considering a low-cost demand of the ISAC BS and downlink users, the proposed two-stage approach is realized by an efficient neural network (NN) framework.
arXiv Detail & Related papers (2024-01-29T14:15:11Z) - Pay Less But Get More: A Dual-Attention-based Channel Estimation Network
for Massive MIMO Systems with Low-Density Pilots [41.213515826100696]
We propose a dual-attention-based channel estimation network (DACEN) to realize accurate channel estimation via low-density pilots.
Experimental results reveal that the proposed DACEN-based method achieves better channel estimation performance than the existing methods.
arXiv Detail & Related papers (2023-03-02T05:34:25Z) - Self-Attention Channel Combinator Frontend for End-to-End Multichannel
Far-field Speech Recognition [1.0276024900942875]
When a sufficiently large far-field training data is presented, jointly optimizing a multichannel and an end-to-end (E2E) Automatic Speech Recognition (ASR) backend shows promising results.
Recent literature has shown traditional beamformer designs, such as MVDR (Minimum Varianceless Response) or fixed beamformers can be successfully integrated into an E2E ASR system with learnable parameters.
We propose the self-attention channel Distortionator (SACC) ASR, which leverages the self-attention mechanism to combine multichannel audio signals in the magnitude spectral domain.
arXiv Detail & Related papers (2021-09-10T11:03:43Z) - Learning to Perform Downlink Channel Estimation in Massive MIMO Systems [72.76968022465469]
We study downlink (DL) channel estimation in a Massive multiple-input multiple-output (MIMO) system.
A common approach is to use the mean value as the estimate, motivated by channel hardening.
We propose two novel estimation methods.
arXiv Detail & Related papers (2021-09-06T13:42:32Z) - Model-Driven Deep Learning Based Channel Estimation and Feedback for
Millimeter-Wave Massive Hybrid MIMO Systems [61.78590389147475]
This paper proposes a model-driven deep learning (MDDL)-based channel estimation and feedback scheme for millimeter-wave (mmWave) systems.
To reduce the uplink pilot overhead for estimating the high-dimensional channels from a limited number of radio frequency (RF) chains, we propose to jointly train the phase shift network and the channel estimator as an auto-encoder.
Numerical results show that the proposed MDDL-based channel estimation and feedback scheme outperforms the state-of-the-art approaches.
arXiv Detail & Related papers (2021-04-22T13:34:53Z) - Deep Denoising Neural Network Assisted Compressive Channel Estimation
for mmWave Intelligent Reflecting Surfaces [99.34306447202546]
This paper proposes a deep denoising neural network assisted compressive channel estimation for mmWave IRS systems.
We first introduce a hybrid passive/active IRS architecture, where very few receive chains are employed to estimate the uplink user-to-IRS channels.
The complete channel matrix can be reconstructed from the limited measurements based on compressive sensing.
arXiv Detail & Related papers (2020-06-03T12:18:57Z) - Audio-visual Multi-channel Recognition of Overlapped Speech [79.21950701506732]
This paper presents an audio-visual multi-channel overlapped speech recognition system featuring tightly integrated separation front-end and recognition back-end.
Experiments suggest that the proposed multi-channel AVSR system outperforms the baseline audio-only ASR system by up to 6.81% (26.83% relative) and 22.22% (56.87% relative) absolute word error rate (WER) reduction on overlapped speech constructed using either simulation or replaying of the lipreading sentence 2 dataset respectively.
arXiv Detail & Related papers (2020-05-18T10:31:19Z) - Millimeter Wave Communications with an Intelligent Reflector:
Performance Optimization and Distributional Reinforcement Learning [119.97450366894718]
A novel framework is proposed to optimize the downlink multi-user communication of a millimeter wave base station.
A channel estimation approach is developed to measure the channel state information (CSI) in real-time.
A distributional reinforcement learning (DRL) approach is proposed to learn the optimal IR reflection and maximize the expectation of downlink capacity.
arXiv Detail & Related papers (2020-02-24T22:18:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.