Towards Robust Unsupervised Disentanglement of Sequential Data -- A Case
Study Using Music Audio
- URL: http://arxiv.org/abs/2205.05871v1
- Date: Thu, 12 May 2022 04:11:25 GMT
- Title: Towards Robust Unsupervised Disentanglement of Sequential Data -- A Case
Study Using Music Audio
- Authors: Yin-Jyun Luo, Sebastian Ewert, Simon Dixon
- Abstract summary: Disentangled sequential autoencoders (DSAEs) represent a class of probabilistic graphical models.
We show that the vanilla DSAE suffers from being sensitive to the choice of model architecture and capacity of the dynamic latent variables.
We propose TS-DSAE, a two-stage training framework that first learns sequence-level prior distributions.
- Score: 17.214062755082065
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Disentangled sequential autoencoders (DSAEs) represent a class of
probabilistic graphical models that describes an observed sequence with dynamic
latent variables and a static latent variable. The former encode information at
a frame rate identical to the observation, while the latter globally governs
the entire sequence. This introduces an inductive bias and facilitates
unsupervised disentanglement of the underlying local and global factors. In
this paper, we show that the vanilla DSAE suffers from being sensitive to the
choice of model architecture and capacity of the dynamic latent variables, and
is prone to collapse the static latent variable. As a countermeasure, we
propose TS-DSAE, a two-stage training framework that first learns
sequence-level prior distributions, which are subsequently employed to
regularise the model and facilitate auxiliary objectives to promote
disentanglement. The proposed framework is fully unsupervised and robust
against the global factor collapse problem across a wide range of model
configurations. It also avoids typical solutions such as adversarial training
which usually involves laborious parameter tuning, and domain-specific data
augmentation. We conduct quantitative and qualitative evaluations to
demonstrate its robustness in terms of disentanglement on both artificial and
real-world music audio datasets.
Related papers
- Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction [88.65168366064061]
We introduce Discrete Denoising Posterior Prediction (DDPP), a novel framework that casts the task of steering pre-trained MDMs as a problem of probabilistic inference.
Our framework leads to a family of three novel objectives that are all simulation-free, and thus scalable.
We substantiate our designs via wet-lab validation, where we observe transient expression of reward-optimized protein sequences.
arXiv Detail & Related papers (2024-10-10T17:18:30Z) - FiP: a Fixed-Point Approach for Causal Generative Modeling [20.88890689294816]
We propose a new and equivalent formalism that does not require DAGs to describe fixed-point problems on the causally ordered variables.
We show three important cases where they can be uniquely recovered given the topological ordering (TO)
arXiv Detail & Related papers (2024-04-10T12:29:05Z) - Conditional Denoising Diffusion for Sequential Recommendation [62.127862728308045]
Two prominent generative models, Generative Adversarial Networks (GANs) and Variational AutoEncoders (VAEs)
GANs suffer from unstable optimization, while VAEs are prone to posterior collapse and over-smoothed generations.
We present a conditional denoising diffusion model, which includes a sequence encoder, a cross-attentive denoising decoder, and a step-wise diffuser.
arXiv Detail & Related papers (2023-04-22T15:32:59Z) - Neural Continuous-Discrete State Space Models for Irregularly-Sampled
Time Series [18.885471782270375]
NCDSSM employs auxiliary variables to disentangle recognition from dynamics, thus requiring amortized inference only for the auxiliary variables.
We propose three flexible parameterizations of the latent dynamics and an efficient training objective that marginalizes the dynamic states during inference.
Empirical results on multiple benchmark datasets show improved imputation and forecasting performance of NCDSSM over existing models.
arXiv Detail & Related papers (2023-01-26T18:45:04Z) - DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained
Diffusion [66.21290235237808]
We introduce an energy constrained diffusion model which encodes a batch of instances from a dataset into evolutionary states.
We provide rigorous theory that implies closed-form optimal estimates for the pairwise diffusion strength among arbitrary instance pairs.
Experiments highlight the wide applicability of our model as a general-purpose encoder backbone with superior performance in various tasks.
arXiv Detail & Related papers (2023-01-23T15:18:54Z) - Benchmarking the Robustness of LiDAR Semantic Segmentation Models [78.6597530416523]
In this paper, we aim to comprehensively analyze the robustness of LiDAR semantic segmentation models under various corruptions.
We propose a new benchmark called SemanticKITTI-C, which features 16 out-of-domain LiDAR corruptions in three groups, namely adverse weather, measurement noise and cross-device discrepancy.
We design a robust LiDAR segmentation model (RLSeg) which greatly boosts the robustness with simple but effective modifications.
arXiv Detail & Related papers (2023-01-03T06:47:31Z) - Deep Neural Dynamic Bayesian Networks applied to EEG sleep spindles
modeling [0.0]
We propose a generative model for single-channel EEG that incorporates the constraints experts actively enforce during visual scoring.
We derive algorithms for exact, tractable inference as a special case of Generalized Expectation Maximization.
We validate the model on three public datasets and provide support that more complex models are able to surpass state-of-the-art detectors.
arXiv Detail & Related papers (2020-10-16T21:48:29Z) - Deep Autoencoding Topic Model with Scalable Hybrid Bayesian Inference [55.35176938713946]
We develop deep autoencoding topic model (DATM) that uses a hierarchy of gamma distributions to construct its multi-stochastic-layer generative network.
We propose a Weibull upward-downward variational encoder that deterministically propagates information upward via a deep neural network, followed by a downward generative model.
The efficacy and scalability of our models are demonstrated on both unsupervised and supervised learning tasks on big corpora.
arXiv Detail & Related papers (2020-06-15T22:22:56Z) - S3VAE: Self-Supervised Sequential VAE for Representation Disentanglement
and Data Generation [31.38329747789168]
We propose a sequential variational autoencoder to learn disentangled representations of sequential data under self-supervision.
We exploit the benefits of some readily accessible supervisory signals from input data itself or some off-the-shelf functional models.
Our model can easily disentangle the representation of an input sequence into static factors and dynamic factors.
arXiv Detail & Related papers (2020-05-23T00:44:38Z) - Variational Hyper RNN for Sequence Modeling [69.0659591456772]
We propose a novel probabilistic sequence model that excels at capturing high variability in time series data.
Our method uses temporal latent variables to capture information about the underlying data pattern.
The efficacy of the proposed method is demonstrated on a range of synthetic and real-world sequential data.
arXiv Detail & Related papers (2020-02-24T19:30:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.