Related papers: EGGCodec: A Robust Neural Encodec Framework for EGG Reconstruction and F0 Extraction

EGGCodec: A Robust Neural Encodec Framework for EGG Reconstruction and F0 Extraction

URL: http://arxiv.org/abs/2508.08924v1
Date: Tue, 12 Aug 2025 13:20:25 GMT
Title: EGGCodec: A Robust Neural Encodec Framework for EGG Reconstruction and F0 Extraction
Authors: Rui Feng, Yuang Chen, Yu Hu, Jun Du, Jiahong Yuan,
Abstract summary: EGGCodec is a robust neural Encodec framework engineered for electroglottography (EGG) signal reconstruction and F0 extraction.<n>We propose a multi-scale frequency-domain loss function to capture the nuanced relationship between original and reconstructed EGG signals.<n>EGGCodec outperforms state-of-the-art F0 extraction schemes, reducing mean absolute error (MAE) from 14.14 Hz to 13.69 Hz, and improving voicing decision error (VDE) by 38.2%.
Score: 48.921538847138315
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This letter introduces EGGCodec, a robust neural Encodec framework engineered for electroglottography (EGG) signal reconstruction and F0 extraction. We propose a multi-scale frequency-domain loss function to capture the nuanced relationship between original and reconstructed EGG signals, complemented by a time-domain correlation loss to improve generalization and accuracy. Unlike conventional Encodec models that extract F0 directly from features, EGGCodec leverages reconstructed EGG signals, which more closely correspond to F0. By removing the conventional GAN discriminator, we streamline EGGCodec's training process without compromising efficiency, incurring only negligible performance degradation. Trained on a widely used EGG-inclusive dataset, extensive evaluations demonstrate that EGGCodec outperforms state-of-the-art F0 extraction schemes, reducing mean absolute error (MAE) from 14.14 Hz to 13.69 Hz, and improving voicing decision error (VDE) by 38.2\%. Moreover, extensive ablation experiments validate the contribution of each component of EGGCodec.

Related papers

EEGReXferNet: A Lightweight Gen-AI Framework for EEG Subspace Reconstruction via Cross-Subject Transfer Learning and Channel-Aware Embedding [2.1349209400003937]
We introduce EEGReXferNet, a lightweight framework for EEG subspace reconstruction via crosssubject-AI transfer learning.<n>EEGReXferNet employs volume conduction across neighboring channels, band-specific convolution encoding, and dynamic latent feature extraction through sliding windows.
arXiv Detail & Related papers (2025-10-26T02:15:25Z)
FreeGAD: A Training-Free yet Effective Approach for Graph Anomaly Detection [54.576802512108685]
Graph Anomaly Detection (GAD) aims to identify nodes that deviate from the majority within a graph.<n>Existing approaches often suffer from high deployment costs and poor scalability due to their complex and resource-intensive training processes.<n>We propose FreeGAD, a novel training-free yet effective GAD method.
arXiv Detail & Related papers (2025-08-14T12:37:20Z)
ECG Latent Feature Extraction with Autoencoders for Downstream Prediction Tasks [2.2616169634370076]
The electrocardiogram (ECG) is an inexpensive and widely available tool for cardiac assessment.<n>Despite its standardized format and small file size, the high complexity and inter-individual variability of ECG signals make it challenging to use in deep learning models.<n>This study addresses these challenges by exploring feature generation methods from representative beat ECGs.<n>We introduce three novel Variational Autoencoder (VAE) variants-Stochastic Autoencoder (SAE), Annealed beta-VAE (A beta-VAE), and Cyclical beta VAE (C beta-VAE)-and compare their effectiveness in maintaining
arXiv Detail & Related papers (2025-07-31T19:37:05Z)
Normalized Attention Guidance: Universal Negative Guidance for Diffusion Models [57.20761595019967]
We present Normalized Attention Guidance (NAG), an efficient, training-free mechanism that applies extrapolation in attention space with L1-based normalization and refinement.<n>NAG restores effective negative guidance where CFG collapses while maintaining fidelity.<n>NAG generalizes across architectures (UNet, DiT), sampling regimes (few-step, multi-step), and modalities (image, video)
arXiv Detail & Related papers (2025-05-27T13:30:46Z)
Comparison of Autoencoder Encodings for ECG Representation in Downstream Prediction Tasks [2.2616169634370076]
We introduce three novel Variational Autoencoder (VAE) variants: Autoencoder (SAE), Annealed beta-VAE (Abeta-VAE), and cyclical beta-VAE (Cbeta-VAE) The Abeta-VAE achieved superior signal reconstruction, reducing the mean absolute error (MAE) to 15.7 plus-minus 3.2 microvolts, which is at the level of signal noise. Our findings demonstrate that these VAE encodings are not only effective in simplifying ECG data but also provide a practical solution for applying deep learning in contexts with limited-scale labeled training data
arXiv Detail & Related papers (2024-10-03T19:30:05Z)
ALF: Adaptive Label Finetuning for Scene Graph Generation [116.59868289196157]
Scene Graph Generation endeavors to predict the relationships between subjects and objects in a given image. Long-tail distribution of relations often leads to biased prediction on coarse labels, presenting a substantial hurdle in SGG. We introduce one-stage data transfer pipeline in SGG, termed Adaptive Label Finetuning (ALF), which eliminates the need for extra retraining sessions. ALF achieves a 16% improvement in mR@100 compared to the typical SGG method Motif, with only a 6% increase in calculation costs compared to the state-of-the-art method IETrans.
arXiv Detail & Related papers (2023-12-29T01:37:27Z)
ECG Artifact Removal from Single-Channel Surface EMG Using Fully Convolutional Networks [9.468136300919062]
This study proposed a novel denoising method to eliminate ECG artifacts from the single-channel sEMG signals using fully convolutional networks (FCN) The proposed method adopts a denoise autoencoder structure and powerful nonlinear mapping capability of neural networks for sEMG denoising.
arXiv Detail & Related papers (2022-10-24T14:12:11Z)
Semantic Perturbations with Normalizing Flows for Improved Generalization [62.998818375912506]
We show that perturbations in the latent space can be used to define fully unsupervised data augmentations. We find that our latent adversarial perturbations adaptive to the classifier throughout its training are most effective.
arXiv Detail & Related papers (2021-08-18T03:20:00Z)
Orthogonal Features Based EEG Signals Denoising Using Fractional and Compressed One-Dimensional CNN AutoEncoder [3.8580784887142774]
This paper presents a fractional one-dimensional convolutional neural network (CNN) autoencoder for denoising the Electroencephalogram (EEG) signals. EEG signals often get contaminated with noise during the recording process, mostly due to muscle artifacts (MA)
arXiv Detail & Related papers (2021-04-16T13:58:05Z)
Conditioning Trick for Training Stable GANs [70.15099665710336]
We propose a conditioning trick, called difference departure from normality, applied on the generator network in response to instability issues during GAN training. We force the generator to get closer to the departure from normality function of real samples computed in the spectral domain of Schur decomposition.
arXiv Detail & Related papers (2020-10-12T16:50:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.