Improving auditory attention decoding performance of linear and
non-linear methods using state-space model
- URL: http://arxiv.org/abs/2004.00910v1
- Date: Thu, 2 Apr 2020 09:56:06 GMT
- Title: Improving auditory attention decoding performance of linear and
non-linear methods using state-space model
- Authors: Ali Aroudi, Tobias de Taillez, and Simon Doclo
- Abstract summary: Recent advances in electroencephalography have shown that it is possible to identify the target speaker from single-trial EEG recordings.
AAD methods reconstruct the attended speech envelope from EEG recordings, based on a linear least-squares cost function or non-linear neural networks.
We investigate a state-space model using correlation coefficients obtained with a small correlation window to improve the decoding performance.
- Score: 21.40315235087551
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Identifying the target speaker in hearing aid applications is crucial to
improve speech understanding. Recent advances in electroencephalography (EEG)
have shown that it is possible to identify the target speaker from single-trial
EEG recordings using auditory attention decoding (AAD) methods. AAD methods
reconstruct the attended speech envelope from EEG recordings, based on a linear
least-squares cost function or non-linear neural networks, and then directly
compare the reconstructed envelope with the speech envelopes of speakers to
identify the attended speaker using Pearson correlation coefficients. Since
these correlation coefficients are highly fluctuating, for a reliable decoding
a large correlation window is used, which causes a large processing delay. In
this paper, we investigate a state-space model using correlation coefficients
obtained with a small correlation window to improve the decoding performance of
the linear and the non-linear AAD methods. The experimental results show that
the state-space model significantly improves the decoding performance.
Related papers
- DiffATR: Diffusion-based Generative Modeling for Audio-Text Retrieval [49.076590578101985]
We present a diffusion-based ATR framework (DiffATR) that generates joint distribution from noise.
Experiments on the AudioCaps and Clotho datasets with superior performances, verify the effectiveness of our approach.
arXiv Detail & Related papers (2024-09-16T06:33:26Z) - A unified multichannel far-field speech recognition system: combining
neural beamforming with attention based end-to-end model [14.795953417531907]
We propose a unified multichannel far-field speech recognition system that combines the neural beamforming and transformer-based Listen, Spell, Attend (LAS) speech recognition system.
The proposed method achieve 19.26% improvement when compared with a strong baseline.
arXiv Detail & Related papers (2024-01-05T07:11:13Z) - BDHT: Generative AI Enables Causality Analysis for Mild Cognitive Impairment [34.60961915466469]
A brain diffuser with hierarchical transformer (BDHT) is proposed to estimate effective connectivity for mild cognitive impairment (MCI) analysis.
The proposed model achieves superior performance in terms of accuracy and robustness compared to existing approaches.
arXiv Detail & Related papers (2023-12-14T15:12:00Z) - Learning Repeatable Speech Embeddings Using An Intra-class Correlation
Regularizer [16.716653844774374]
We evaluate the repeatability of embeddings using the intra-class correlation coefficient (ICC)
We propose a novel regularizer, the ICC regularizer, as a complementary component for contrastive losses to guide deep neural networks to produce embeddings with higher repeatability.
We implement the ICC regularizer and apply it to three speech tasks: speaker verification, voice style conversion, and a clinical application for detecting dysphonic voice.
arXiv Detail & Related papers (2023-10-25T23:21:46Z) - DASA: Difficulty-Aware Semantic Augmentation for Speaker Verification [55.306583814017046]
We present a novel difficulty-aware semantic augmentation (DASA) approach for speaker verification.
DASA generates diversified training samples in speaker embedding space with negligible extra computing cost.
The best result achieves a 14.6% relative reduction in EER metric on CN-Celeb evaluation set.
arXiv Detail & Related papers (2023-10-18T17:07:05Z) - Fast and efficient speech enhancement with variational autoencoders [0.0]
Unsupervised speech enhancement based on variational autoencoders has shown promising performance compared with the commonly used supervised methods.
We propose a new approach based on Langevin dynamics that generates multiple sequences of samples and comes with a total variation-based regularization to incorporate temporal correlations of latent vectors.
Our experiments demonstrate that the developed framework makes an effective compromise between computational efficiency and enhancement quality, and outperforms existing methods.
arXiv Detail & Related papers (2022-11-02T09:52:13Z) - Deep Equilibrium Assisted Block Sparse Coding of Inter-dependent
Signals: Application to Hyperspectral Imaging [71.57324258813675]
A dataset of inter-dependent signals is defined as a matrix whose columns demonstrate strong dependencies.
A neural network is employed to act as structure prior and reveal the underlying signal interdependencies.
Deep unrolling and Deep equilibrium based algorithms are developed, forming highly interpretable and concise deep-learning-based architectures.
arXiv Detail & Related papers (2022-03-29T21:00:39Z) - Speaker Embedding-aware Neural Diarization: a Novel Framework for
Overlapped Speech Diarization in the Meeting Scenario [51.5031673695118]
We reformulate overlapped speech diarization as a single-label prediction problem.
We propose the speaker embedding-aware neural diarization (SEND) system.
arXiv Detail & Related papers (2022-03-18T06:40:39Z) - Discretization and Re-synthesis: an alternative method to solve the
Cocktail Party Problem [65.25725367771075]
This study demonstrates, for the first time, that the synthesis-based approach can also perform well on this problem.
Specifically, we propose a novel speech separation/enhancement model based on the recognition of discrete symbols.
By utilizing the synthesis model with the input of discrete symbols, after the prediction of discrete symbol sequence, each target speech could be re-synthesized.
arXiv Detail & Related papers (2021-12-17T08:35:40Z) - Diffusion-Based Representation Learning [65.55681678004038]
We augment the denoising score matching framework to enable representation learning without any supervised signal.
In contrast, the introduced diffusion-based representation learning relies on a new formulation of the denoising score matching objective.
Using the same approach, we propose to learn an infinite-dimensional latent code that achieves improvements of state-of-the-art models on semi-supervised image classification.
arXiv Detail & Related papers (2021-05-29T09:26:02Z) - Correlation based Multi-phasal models for improved imagined speech EEG
recognition [22.196642357767338]
This work aims to profit from the parallel information contained in multi-phasal EEG data recorded while speaking, imagining and performing articulatory movements corresponding to specific speech units.
A bi-phase common representation learning module using neural networks is designed to model the correlation and between an analysis phase and a support phase.
The proposed approach further handles the non-availability of multi-phasal data during decoding.
arXiv Detail & Related papers (2020-11-04T09:39:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.