Post-processing of EEG-based Auditory Attention Decoding Decisions via Hidden Markov Models
- URL: http://arxiv.org/abs/2506.24024v1
- Date: Mon, 30 Jun 2025 16:31:26 GMT
- Title: Post-processing of EEG-based Auditory Attention Decoding Decisions via Hidden Markov Models
- Authors: Nicolas Heintz, Tom Francart, Alexander Bertrand,
- Abstract summary: Auditory attention decoding (AAD) algorithms exploit brain signals, such as electroencephalography (EEG)<n>We propose augmenting AAD with a hidden Markov model (HMM) that models the temporal structure of attention.<n>We show how a HMM can significantly improve existing AAD algorithms in both causal (real-time) and non-causal (offline) settings.
- Score: 48.164830545588785
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Auditory attention decoding (AAD) algorithms exploit brain signals, such as electroencephalography (EEG), to identify which speaker a listener is focusing on in a multi-speaker environment. While state-of-the-art AAD algorithms can identify the attended speaker on short time windows, their predictions are often too inaccurate for practical use. In this work, we propose augmenting AAD with a hidden Markov model (HMM) that models the temporal structure of attention. More specifically, the HMM relies on the fact that a subject is much less likely to switch attention than to keep attending the same speaker at any moment in time. We show how a HMM can significantly improve existing AAD algorithms in both causal (real-time) and non-causal (offline) settings. We further demonstrate that HMMs outperform existing postprocessing approaches in both accuracy and responsiveness, and explore how various factors such as window length, switching frequency, and AAD accuracy influence overall performance. The proposed method is computationally efficient, intuitive to use and applicable in both real-time and offline settings.
Related papers
- From Vision to Sound: Advancing Audio Anomaly Detection with Vision-Based Algorithms [6.643376250301589]
We investigate the adaptation of such algorithms to the audio domain to address the problem of Audio Anomaly Detection (AAD)<n>Unlike most existing AAD methods, which primarily classify anomalous samples, our approach introduces fine-grained temporal-frequency localization of anomalies within the spectrogram.<n>We evaluate our approach on industrial and environmental benchmarks, demonstrating the effectiveness of VAD techniques in detecting anomalies in audio signals.
arXiv Detail & Related papers (2025-02-25T16:22:42Z) - AADNet: Exploring EEG Spatiotemporal Information for Fast and Accurate Orientation and Timbre Detection of Auditory Attention Based on A Cue-Masked Paradigm [4.479495549911642]
Auditory attention decoding from electroencephalogram (EEG) could infer to which source the user is attending in noisy environments.<n>This study proposed a cue-masked auditory attention paradigm to avoid information leakage before the experiment.<n>An end-to-end deep learning model, AADNet, was proposed to exploit thetemporal information from the short time window EEG signals.
arXiv Detail & Related papers (2025-01-07T06:51:17Z) - It's Never Too Late: Fusing Acoustic Information into Large Language
Models for Automatic Speech Recognition [70.77292069313154]
Large language models (LLMs) can be successfully used for generative error correction (GER) on top of the automatic speech recognition (ASR) output.
In this work, we aim to overcome such a limitation by infusing acoustic information before generating the predicted transcription through a novel late fusion solution termed Uncertainty-Aware Dynamic Fusion (UADF)
arXiv Detail & Related papers (2024-02-08T07:21:45Z) - What to Remember: Self-Adaptive Continual Learning for Audio Deepfake
Detection [53.063161380423715]
Existing detection models have shown remarkable success in discriminating known deepfake audio, but struggle when encountering new attack types.
We propose a continual learning approach called Radian Weight Modification (RWM) for audio deepfake detection.
arXiv Detail & Related papers (2023-12-15T09:52:17Z) - Brain-Driven Representation Learning Based on Diffusion Model [25.375490061512]
Denoising diffusion probabilistic models (DDPMs) are explored in our research as a means to address this issue.
Using DDPMs in conjunction with a conditional autoencoder, our new approach considerably outperforms traditional machine learning algorithms.
Our results highlight the potential of DDPMs as a sophisticated computational method for the analysis of speech-related EEG signals.
arXiv Detail & Related papers (2023-11-14T05:59:58Z) - Anomalous Sound Detection using Audio Representation with Machine ID
based Contrastive Learning Pretraining [52.191658157204856]
This paper uses contrastive learning to refine audio representations for each machine ID, rather than for each audio sample.
The proposed two-stage method uses contrastive learning to pretrain the audio representation model.
Experiments show that our method outperforms the state-of-the-art methods using contrastive learning or self-supervised classification.
arXiv Detail & Related papers (2023-04-07T11:08:31Z) - Learning Hidden Markov Models Using Conditional Samples [72.20944611510198]
This paper is concerned with the computational complexity of learning the Hidden Markov Model (HMM)
In this paper, we consider an interactive access model, in which the algorithm can query for samples from the conditional distributions of the HMMs.
Specifically, we obtain efficient algorithms for learning HMMs in settings where we have query access to the exact conditional probabilities.
arXiv Detail & Related papers (2023-02-28T16:53:41Z) - IM-IAD: Industrial Image Anomaly Detection Benchmark in Manufacturing [88.35145788575348]
Image anomaly detection (IAD) is an emerging and vital computer vision task in industrial manufacturing.
The lack of a uniform IM benchmark is hindering the development and usage of IAD methods in real-world applications.
We construct a comprehensive image anomaly detection benchmark (IM-IAD), which includes 19 algorithms on seven major datasets.
arXiv Detail & Related papers (2023-01-31T01:24:45Z) - Dissecting User-Perceived Latency of On-Device E2E Speech Recognition [34.645194215436966]
We show that factors affecting token emission latency, and endpointing behavior significantly impact on user-perceived latency (UPL)
We achieve the best trade-off between latency and word error rate when performing ASR jointly with endpointing, and using the recently proposed alignment regularization.
arXiv Detail & Related papers (2021-04-06T00:55:11Z) - Switching Variational Auto-Encoders for Noise-Agnostic Audio-visual
Speech Enhancement [26.596930749375474]
We introduce the use of a latent sequential variable with Markovian dependencies to switch between different VAE architectures through time.
We derive the corresponding variational expectation-maximization algorithm to estimate the parameters of the model and enhance the speech signal.
arXiv Detail & Related papers (2021-02-08T11:45:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.