Related papers: Towards Robust Unsupervised Disentanglement of Sequential Data -- A Case Study Using Music Audio

Towards Robust Unsupervised Disentanglement of Sequential Data -- A Case Study Using Music Audio

URL: http://arxiv.org/abs/2205.05871v1
Date: Thu, 12 May 2022 04:11:25 GMT
Title: Towards Robust Unsupervised Disentanglement of Sequential Data -- A Case Study Using Music Audio
Authors: Yin-Jyun Luo, Sebastian Ewert, Simon Dixon
Abstract summary: Disentangled sequential autoencoders (DSAEs) represent a class of probabilistic graphical models. We show that the vanilla DSAE suffers from being sensitive to the choice of model architecture and capacity of the dynamic latent variables. We propose TS-DSAE, a two-stage training framework that first learns sequence-level prior distributions.
Score: 17.214062755082065
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Disentangled sequential autoencoders (DSAEs) represent a class of probabilistic graphical models that describes an observed sequence with dynamic latent variables and a static latent variable. The former encode information at a frame rate identical to the observation, while the latter globally governs the entire sequence. This introduces an inductive bias and facilitates unsupervised disentanglement of the underlying local and global factors. In this paper, we show that the vanilla DSAE suffers from being sensitive to the choice of model architecture and capacity of the dynamic latent variables, and is prone to collapse the static latent variable. As a countermeasure, we propose TS-DSAE, a two-stage training framework that first learns sequence-level prior distributions, which are subsequently employed to regularise the model and facilitate auxiliary objectives to promote disentanglement. The proposed framework is fully unsupervised and robust against the global factor collapse problem across a wide range of model configurations. It also avoids typical solutions such as adversarial training which usually involves laborious parameter tuning, and domain-specific data augmentation. We conduct quantitative and qualitative evaluations to demonstrate its robustness in terms of disentanglement on both artificial and real-world music audio datasets.

Related papers

Learning Time-Aware Causal Representation for Model Generalization in Evolving Domains [50.66049136093248]
We develop a time-aware structural causal model (SCM) that incorporates dynamic causal factors and the causal mechanism drifts.<n>We show that our method can yield the optimal causal predictor for each time domain.<n>Results on both synthetic and real-world datasets exhibit that SYNC can achieve superior temporal generalization performance.
arXiv Detail & Related papers (2025-06-21T14:05:37Z)
Consistent World Models via Foresight Diffusion [56.45012929930605]
We argue that a key bottleneck in learning consistent diffusion-based world models lies in the suboptimal predictive ability.<n>We propose Foresight Diffusion (ForeDiff), a diffusion-based world modeling framework that enhances consistency by decoupling condition understanding from target denoising.
arXiv Detail & Related papers (2025-05-22T10:01:59Z)
Lightweight Channel-wise Dynamic Fusion Model: Non-stationary Time Series Forecasting via Entropy Analysis [25.291749176117662]
We show that variance can be a valid and interpretable proxy for non-stationarity of time series. We propose a novel lightweight textitChannel-wise textitDynamic textitFusion textitModel (textitCDFM) Comprehensive experiments on seven time series datasets demonstrate the superiority and generalization capabilities of CDFM.
arXiv Detail & Related papers (2025-03-04T13:29:42Z)
Spatial Reasoning with Denoising Models [49.83744014336816]
We introduce a framework to perform reasoning over sets of continuous variables via denoising generative models. We demonstrate for the first time, that order of generation can successfully be predicted by the denoising network itself.
arXiv Detail & Related papers (2025-02-28T14:08:30Z)
Stochastic Control for Fine-tuning Diffusion Models: Optimality, Regularity, and Convergence [11.400431211239958]
Diffusion models have emerged as powerful tools for generative modeling. We propose a control framework for fine-tuning diffusion models. We show that PI-FT achieves global convergence at a linear rate.
arXiv Detail & Related papers (2024-12-24T04:55:46Z)
DRIVE: Dual-Robustness via Information Variability and Entropic Consistency in Source-Free Unsupervised Domain Adaptation [10.127634263641877]
Adapting machine learning models to new domains without labeled data is a critical challenge in applications like medical imaging, autonomous driving, and remote sensing. This task, known as Source-Free Unsupervised Domain Adaptation (SFUDA), involves adapting a pre-trained model to a target domain using only unlabeled target data. Existing SFUDA methods often rely on single-model architectures, struggling with uncertainty and variability in the target domain. We propose DRIVE, a novel SFUDA framework leveraging a dual-model architecture. The two models, with identical weights, work in parallel to capture diverse target domain characteristics.
arXiv Detail & Related papers (2024-11-24T20:35:04Z)
Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction [88.65168366064061]
We introduce Discrete Denoising Posterior Prediction (DDPP), a novel framework that casts the task of steering pre-trained MDMs as a problem of probabilistic inference. Our framework leads to a family of three novel objectives that are all simulation-free, and thus scalable. We substantiate our designs via wet-lab validation, where we observe transient expression of reward-optimized protein sequences.
arXiv Detail & Related papers (2024-10-10T17:18:30Z)
FiP: a Fixed-Point Approach for Causal Generative Modeling [20.88890689294816]
We propose a new and equivalent formalism that does not require DAGs to describe fixed-point problems on the causally ordered variables. We show three important cases where they can be uniquely recovered given the topological ordering (TO)
arXiv Detail & Related papers (2024-04-10T12:29:05Z)
Conditional Denoising Diffusion for Sequential Recommendation [62.127862728308045]
Two prominent generative models, Generative Adversarial Networks (GANs) and Variational AutoEncoders (VAEs) GANs suffer from unstable optimization, while VAEs are prone to posterior collapse and over-smoothed generations. We present a conditional denoising diffusion model, which includes a sequence encoder, a cross-attentive denoising decoder, and a step-wise diffuser.
arXiv Detail & Related papers (2023-04-22T15:32:59Z)
Neural Continuous-Discrete State Space Models for Irregularly-Sampled Time Series [18.885471782270375]
NCDSSM employs auxiliary variables to disentangle recognition from dynamics, thus requiring amortized inference only for the auxiliary variables. We propose three flexible parameterizations of the latent dynamics and an efficient training objective that marginalizes the dynamic states during inference. Empirical results on multiple benchmark datasets show improved imputation and forecasting performance of NCDSSM over existing models.
arXiv Detail & Related papers (2023-01-26T18:45:04Z)
DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained Diffusion [66.21290235237808]
We introduce an energy constrained diffusion model which encodes a batch of instances from a dataset into evolutionary states. We provide rigorous theory that implies closed-form optimal estimates for the pairwise diffusion strength among arbitrary instance pairs. Experiments highlight the wide applicability of our model as a general-purpose encoder backbone with superior performance in various tasks.
arXiv Detail & Related papers (2023-01-23T15:18:54Z)
Benchmarking the Robustness of LiDAR Semantic Segmentation Models [78.6597530416523]
In this paper, we aim to comprehensively analyze the robustness of LiDAR semantic segmentation models under various corruptions. We propose a new benchmark called SemanticKITTI-C, which features 16 out-of-domain LiDAR corruptions in three groups, namely adverse weather, measurement noise and cross-device discrepancy. We design a robust LiDAR segmentation model (RLSeg) which greatly boosts the robustness with simple but effective modifications.
arXiv Detail & Related papers (2023-01-03T06:47:31Z)
Deep Neural Dynamic Bayesian Networks applied to EEG sleep spindles modeling [0.0]
We propose a generative model for single-channel EEG that incorporates the constraints experts actively enforce during visual scoring. We derive algorithms for exact, tractable inference as a special case of Generalized Expectation Maximization. We validate the model on three public datasets and provide support that more complex models are able to surpass state-of-the-art detectors.
arXiv Detail & Related papers (2020-10-16T21:48:29Z)
Deep Autoencoding Topic Model with Scalable Hybrid Bayesian Inference [55.35176938713946]
We develop deep autoencoding topic model (DATM) that uses a hierarchy of gamma distributions to construct its multi-stochastic-layer generative network. We propose a Weibull upward-downward variational encoder that deterministically propagates information upward via a deep neural network, followed by a downward generative model. The efficacy and scalability of our models are demonstrated on both unsupervised and supervised learning tasks on big corpora.
arXiv Detail & Related papers (2020-06-15T22:22:56Z)
S3VAE: Self-Supervised Sequential VAE for Representation Disentanglement and Data Generation [31.38329747789168]
We propose a sequential variational autoencoder to learn disentangled representations of sequential data under self-supervision. We exploit the benefits of some readily accessible supervisory signals from input data itself or some off-the-shelf functional models. Our model can easily disentangle the representation of an input sequence into static factors and dynamic factors.
arXiv Detail & Related papers (2020-05-23T00:44:38Z)
Variational Hyper RNN for Sequence Modeling [69.0659591456772]
We propose a novel probabilistic sequence model that excels at capturing high variability in time series data. Our method uses temporal latent variables to capture information about the underlying data pattern. The efficacy of the proposed method is demonstrated on a range of synthetic and real-world sequential data.
arXiv Detail & Related papers (2020-02-24T19:30:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.