Mixture of Dynamical Variational Autoencoders for Multi-Source
Trajectory Modeling and Separation
- URL: http://arxiv.org/abs/2312.04167v1
- Date: Thu, 7 Dec 2023 09:36:31 GMT
- Title: Mixture of Dynamical Variational Autoencoders for Multi-Source
Trajectory Modeling and Separation
- Authors: Xiaoyu Lin, Laurent Girin, Xavier Alameda-Pineda
- Abstract summary: We propose a mixture of dynamical variational autoencoders (MixDVAE) to model the dynamics of a system composed of multiple moving sources.
We illustrate the versatility of the proposed MixDVAE model on two tasks: a computer vision task, and an audio processing task, namely single-channel audio source separation.
- Score: 28.24190848937156
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In this paper, we propose a latent-variable generative model called mixture
of dynamical variational autoencoders (MixDVAE) to model the dynamics of a
system composed of multiple moving sources. A DVAE model is pre-trained on a
single-source dataset to capture the source dynamics. Then, multiple instances
of the pre-trained DVAE model are integrated into a multi-source mixture model
with a discrete observation-to-source assignment latent variable. The posterior
distributions of both the discrete observation-to-source assignment variable
and the continuous DVAE variables representing the sources content/position are
estimated using a variational expectation-maximization algorithm, leading to
multi-source trajectories estimation. We illustrate the versatility of the
proposed MixDVAE model on two tasks: a computer vision task, namely
multi-object tracking, and an audio processing task, namely single-channel
audio source separation. Experimental results show that the proposed method
works well on these two tasks, and outperforms several baseline methods.
Related papers
- EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods.
EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z) - CONTRAST: Continual Multi-source Adaptation to Dynamic Distributions [42.293444710522294]
Continual Multi-source Adaptation to Dynamic Distributions (CONTRAST) is a novel method that optimally combines multiple source models to adapt to the dynamic test data.
We show that the proposed method is able to optimally combine the source models and prioritize updates to the model least prone to forgetting.
arXiv Detail & Related papers (2024-01-04T22:23:56Z) - Cross-modal Prompts: Adapting Large Pre-trained Models for Audio-Visual
Downstream Tasks [55.36987468073152]
This paper proposes a novel Dual-Guided Spatial-Channel-Temporal (DG-SCT) attention mechanism.
The DG-SCT module incorporates trainable cross-modal interaction layers into pre-trained audio-visual encoders.
Our proposed model achieves state-of-the-art results across multiple downstream tasks, including AVE, AVVP, AVS, and AVQA.
arXiv Detail & Related papers (2023-11-09T05:24:20Z) - Learning multi-modal generative models with permutation-invariant encoders and tighter variational objectives [5.549794481031468]
Devising deep latent variable models for multi-modal data has been a long-standing theme in machine learning research.
In this work, we consider a variational objective that can tightly approximate the data log-likelihood.
We develop more flexible aggregation schemes that avoid the inductive biases in PoE or MoE approaches.
arXiv Detail & Related papers (2023-09-01T10:32:21Z) - Improving Out-of-Distribution Robustness of Classifiers via Generative
Interpolation [56.620403243640396]
Deep neural networks achieve superior performance for learning from independent and identically distributed (i.i.d.) data.
However, their performance deteriorates significantly when handling out-of-distribution (OoD) data.
We develop a simple yet effective method called Generative Interpolation to fuse generative models trained from multiple domains for synthesizing diverse OoD samples.
arXiv Detail & Related papers (2023-07-23T03:53:53Z) - Model ensemble instead of prompt fusion: a sample-specific knowledge
transfer method for few-shot prompt tuning [85.55727213502402]
We focus on improving the few-shot performance of prompt tuning by transferring knowledge from soft prompts of source tasks.
We propose Sample-specific Ensemble of Source Models (SESoM)
SESoM learns to adjust the contribution of each source model for each target sample separately when ensembling source model outputs.
arXiv Detail & Related papers (2022-10-23T01:33:16Z) - Unsupervised Multiple-Object Tracking with a Dynamical Variational
Autoencoder [25.293475313066967]
We present an unsupervised probabilistic model and associated estimation algorithm for multi-object tracking (MOT) based on a dynamical variational autoencoder (DVAE)
DVAE is a latent-variable deep generative model that can be seen as an extension of the variational autoencoder for the modeling of temporal sequences.
It is included in DVAE-UMOT to model the objects' dynamics, after being pre-trained on an unlabeled synthetic dataset single-object trajectories.
arXiv Detail & Related papers (2022-02-18T17:27:27Z) - Deep Variational Models for Collaborative Filtering-based Recommender
Systems [63.995130144110156]
Deep learning provides accurate collaborative filtering models to improve recommender system results.
Our proposed models apply the variational concept to injectity in the latent space of the deep architecture.
Results show the superiority of the proposed approach in scenarios where the variational enrichment exceeds the injected noise effect.
arXiv Detail & Related papers (2021-07-27T08:59:39Z) - Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors.
We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method.
Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z) - Variational Dynamic Mixtures [18.730501689781214]
We develop variational dynamic mixtures (VDM) to infer sequential latent variables.
In an empirical study, we show that VDM outperforms competing approaches on highly multi-modal datasets.
arXiv Detail & Related papers (2020-10-20T16:10:07Z) - Relaxed-Responsibility Hierarchical Discrete VAEs [3.976291254896486]
We introduce textitRelaxed-Responsibility Vector-Quantisation, a novel way to parameterise discrete latent variables.
We achieve state-of-the-art bits-per-dim results for various standard datasets.
arXiv Detail & Related papers (2020-07-14T19:10:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.