Related papers: Causal Self-supervised Pretrained Frontend with Predictive Code for Speech Separation

Causal Self-supervised Pretrained Frontend with Predictive Code for Speech Separation

URL: http://arxiv.org/abs/2504.02302v1
Date: Thu, 03 Apr 2025 06:18:30 GMT
Title: Causal Self-supervised Pretrained Frontend with Predictive Code for Speech Separation
Authors: Wupeng Wang, Zexu Pan, Xinke Li, Shuai Wang, Haizhou Li,
Abstract summary: Speech separation (SS) seeks to disentangle a multi-talker speech mixture into single-talker speech streams.<n> Causal separation models, which rely only on past and present information, offer a promising solution for real-time streaming.<n>We introduce a novel that is designed to mitigate the mismatch between training and run-time inference by implicitly incorporating future information into causal models.
Score: 42.63061599979695
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Speech separation (SS) seeks to disentangle a multi-talker speech mixture into single-talker speech streams. Although SS can be generally achieved using offline methods, such a processing paradigm is not suitable for real-time streaming applications. Causal separation models, which rely only on past and present information, offer a promising solution for real-time streaming. However, these models typically suffer from notable performance degradation due to the absence of future context. In this paper, we introduce a novel frontend that is designed to mitigate the mismatch between training and run-time inference by implicitly incorporating future information into causal models through predictive patterns. The pretrained frontend employs a transformer decoder network with a causal convolutional encoder as the backbone and is pretrained in a self-supervised manner with two innovative pretext tasks: autoregressive hybrid prediction and contextual knowledge distillation. These tasks enable the model to capture predictive patterns directly from mixtures in a self-supervised manner. The pretrained frontend subsequently serves as a feature extractor to generate high-quality predictive patterns. Comprehensive evaluations on synthetic and real-world datasets validated the effectiveness of the proposed pretrained frontend.

Related papers

Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion [70.4360995984905]
We introduce Self Forcing, a novel training paradigm for autoregressive video diffusion models.<n>It addresses the longstanding issue of exposure bias, where models trained on ground-truth context must generate sequences conditioned on their own imperfect outputs.
arXiv Detail & Related papers (2025-06-09T17:59:55Z)
Generative Regression Based Watch Time Prediction for Short-Video Recommendation [36.95095097454143]
Watch time prediction has emerged as a pivotal task in short video recommendation systems. Recent studies have attempted to address these issues by converting the continuous watch time estimation into an ordinal regression task. We propose a novel Generative Regression (GR) framework that reformulates WTP as a sequence generation task.
arXiv Detail & Related papers (2024-12-28T16:48:55Z)
Proactive Model Adaptation Against Concept Drift for Online Time Series Forecasting [23.50574069148193]
We present a novel proactive model adaptation framework for online time series forecasting.<n> Proceed first estimates the concept drift between the recently used training samples and the current test sample.<n>It then employs an adaptation generator to efficiently translate the estimated drift into parameter adjustments.
arXiv Detail & Related papers (2024-12-11T14:57:10Z)
Adversarial Robustification via Text-to-Image Diffusion Models [56.37291240867549]
Adrial robustness has been conventionally believed as a challenging property to encode for neural networks. We develop a scalable and model-agnostic solution to achieve adversarial robustness without using any data.
arXiv Detail & Related papers (2024-07-26T10:49:14Z)
Reinforced Decoder: Towards Training Recurrent Neural Networks for Time Series Forecasting [1.5213268724320657]
Recurrent neural network-based sequence-to-sequence models have been extensively applied for multi-step-ahead time series forecasting. These models typically involve a decoder trained using either its previous forecasts or the actual observed values as the decoder inputs. This study proposes a novel training approach called reinforced decoder, which introduces auxiliary models to generate alternative decoder inputs.
arXiv Detail & Related papers (2024-06-14T00:24:29Z)
Fine-grained Forecasting Models Via Gaussian Process Blurring Effect [6.472434306724611]
Time series forecasting is a challenging task due to the existence of complex and dynamic temporal dependencies. Using more training data is one way to improve the accuracy, but this source is often limited. We are building on successful denoising approaches for image generation by advocating for an end-to-end forecasting and denoising paradigm.
arXiv Detail & Related papers (2023-12-21T20:25:16Z)
From Fake to Real: Pretraining on Balanced Synthetic Images to Prevent Spurious Correlations in Image Recognition [64.59093444558549]
We propose a simple, easy-to-implement, two-step training pipeline that we call From Fake to Real. By training on real and synthetic data separately, FFR does not expose the model to the statistical differences between real and synthetic data. Our experiments show that FFR improves worst group accuracy over the state-of-the-art by up to 20% over three datasets.
arXiv Detail & Related papers (2023-08-08T19:52:28Z)
Improving Adaptive Conformal Prediction Using Self-Supervised Learning [72.2614468437919]
We train an auxiliary model with a self-supervised pretext task on top of an existing predictive model and use the self-supervised error as an additional feature to estimate nonconformity scores. We empirically demonstrate the benefit of the additional information using both synthetic and real data on the efficiency (width), deficit, and excess of conformal prediction intervals.
arXiv Detail & Related papers (2023-02-23T18:57:14Z)
Debiased Fine-Tuning for Vision-language Models by Prompt Regularization [50.41984119504716]
We present a new paradigm for fine-tuning large-scale vision pre-trained models on downstream task, dubbed Prompt Regularization (ProReg) ProReg uses the prediction by prompting the pretrained model to regularize the fine-tuning. We show the consistently strong performance of ProReg compared with conventional fine-tuning, zero-shot prompt, prompt tuning, and other state-of-the-art methods.
arXiv Detail & Related papers (2023-01-29T11:53:55Z)
Selective Prediction via Training Dynamics [31.708701583736644]
We show that state-of-the-art selective prediction performance can be attained solely from studying the training dynamics of a model.<n>In particular, we reject data points exhibiting too much disagreement with the final prediction at late stages in training.<n>The proposed rejection mechanism is domain-agnostic (i.e., it works for both discrete and real-valued prediction) and can be flexibly combined with existing selective prediction approaches.
arXiv Detail & Related papers (2022-05-26T17:51:29Z)
PreTR: Spatio-Temporal Non-Autoregressive Trajectory Prediction Transformer [0.9786690381850356]
We introduce a model called PRediction Transformer (PReTR) that extracts features from the multi-agent scenes by employing a factorized-temporal attention module. It shows less computational needs than previously studied models with empirically better results. We leverage encoder-decoder Transformer networks for parallel decoding a set of learned object queries.
arXiv Detail & Related papers (2022-03-17T12:52:23Z)
Test-time Collective Prediction [73.74982509510961]
Multiple parties in machine learning want to jointly make predictions on future test points. Agents wish to benefit from the collective expertise of the full set of agents, but may not be willing to release their data or model parameters. We explore a decentralized mechanism to make collective predictions at test time, leveraging each agent's pre-trained model.
arXiv Detail & Related papers (2021-06-22T18:29:58Z)
Aligned Contrastive Predictive Coding [10.521845940927163]
We investigate the possibility of forcing a self-supervised model trained using a contrastive predictive loss to extract slowly varying latent representations. Rather than producing individual predictions for each of the future representations, the model emits a sequence of predictions shorter than that of the upcoming representations to which they will be aligned.
arXiv Detail & Related papers (2021-04-24T13:07:22Z)
Autoencoding Variational Autoencoder [56.05008520271406]
We study the implications of this behaviour on the learned representations and also the consequences of fixing it by introducing a notion of self consistency. We show that encoders trained with our self-consistency approach lead to representations that are robust (insensitive) to perturbations in the input introduced by adversarial attacks.
arXiv Detail & Related papers (2020-12-07T14:16:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.