Learning Sequential Latent Variable Models from Multimodal Time Series
Data
- URL: http://arxiv.org/abs/2204.10419v1
- Date: Thu, 21 Apr 2022 21:59:24 GMT
- Title: Learning Sequential Latent Variable Models from Multimodal Time Series
Data
- Authors: Oliver Limoyo, Trevor Ablett, and Jonathan Kelly
- Abstract summary: We present a self-supervised generative modelling framework to jointly learn a probabilistic latent state representation of multimodal data.
We demonstrate that our approach leads to significant improvements in prediction and representation quality.
- Score: 6.107812768939553
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sequential modelling of high-dimensional data is an important problem that
appears in many domains including model-based reinforcement learning and
dynamics identification for control. Latent variable models applied to
sequential data (i.e., latent dynamics models) have been shown to be a
particularly effective probabilistic approach to solve this problem, especially
when dealing with images. However, in many application areas (e.g., robotics),
information from multiple sensing modalities is available -- existing latent
dynamics methods have not yet been extended to effectively make use of such
multimodal sequential data. Multimodal sensor streams can be correlated in a
useful manner and often contain complementary information across modalities. In
this work, we present a self-supervised generative modelling framework to
jointly learn a probabilistic latent state representation of multimodal data
and the respective dynamics. Using synthetic and real-world datasets from a
multimodal robotic planar pushing task, we demonstrate that our approach leads
to significant improvements in prediction and representation quality.
Furthermore, we compare to the common learning baseline of concatenating each
modality in the latent space and show that our principled probabilistic
formulation performs better. Finally, despite being fully self-supervised, we
demonstrate that our method is nearly as effective as an existing supervised
approach that relies on ground truth labels.
Related papers
- Learning Multimodal Latent Generative Models with Energy-Based Prior [3.6648642834198797]
We propose a novel framework that integrates the latent generative model with the EBM.
This approach results in a more expressive and informative prior, better-capturing of information across multiple modalities.
arXiv Detail & Related papers (2024-09-30T01:38:26Z) - Missing Modality Prediction for Unpaired Multimodal Learning via Joint Embedding of Unimodal Models [6.610033827647869]
In real-world scenarios, consistently acquiring complete multimodal data presents significant challenges.
This often leads to the issue of missing modalities, where data for certain modalities are absent.
We propose a novel framework integrating parameter-efficient fine-tuning of unimodal pretrained models with a self-supervised joint-embedding learning method.
arXiv Detail & Related papers (2024-07-17T14:44:25Z) - MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild [81.32127423981426]
Multimodal emotion recognition based on audio and video data is important for real-world applications.
Recent methods have focused on exploiting advances of self-supervised learning (SSL) for pre-training of strong multimodal encoders.
We propose a different perspective on the problem and investigate the advancement of multimodal DFER performance by adapting SSL-pre-trained disjoint unimodal encoders.
arXiv Detail & Related papers (2024-04-13T13:39:26Z) - Latent variable model for high-dimensional point process with structured missingness [4.451479907610764]
Longitudinal data are important in numerous fields, such as healthcare, sociology and seismology.
Real-world datasets can be high-dimensional, contain structured missingness patterns, and measurement time points can be governed by an unknown process.
We propose a flexible and efficient latent-variable model that is capable of addressing all these limitations.
arXiv Detail & Related papers (2024-02-08T15:41:48Z) - Leveraging Diffusion Disentangled Representations to Mitigate Shortcuts
in Underspecified Visual Tasks [92.32670915472099]
We propose an ensemble diversification framework exploiting the generation of synthetic counterfactuals using Diffusion Probabilistic Models (DPMs)
We show that diffusion-guided diversification can lead models to avert attention from shortcut cues, achieving ensemble diversity performance comparable to previous methods requiring additional data collection.
arXiv Detail & Related papers (2023-10-03T17:37:52Z) - StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized
Image-Dialogue Data [129.92449761766025]
We propose a novel data collection methodology that synchronously synthesizes images and dialogues for visual instruction tuning.
This approach harnesses the power of generative models, marrying the abilities of ChatGPT and text-to-image generative models.
Our research includes comprehensive experiments conducted on various datasets.
arXiv Detail & Related papers (2023-08-20T12:43:52Z) - Learning Latent Dynamics via Invariant Decomposition and
(Spatio-)Temporal Transformers [0.6767885381740952]
We propose a method for learning dynamical systems from high-dimensional empirical data.
We focus on the setting in which data are available from multiple different instances of a system.
We study behaviour through simple theoretical analyses and extensive experiments on synthetic and real-world datasets.
arXiv Detail & Related papers (2023-06-21T07:52:07Z) - Multi-scale Attention Flow for Probabilistic Time Series Forecasting [68.20798558048678]
We propose a novel non-autoregressive deep learning model, called Multi-scale Attention Normalizing Flow(MANF)
Our model avoids the influence of cumulative error and does not increase the time complexity.
Our model achieves state-of-the-art performance on many popular multivariate datasets.
arXiv Detail & Related papers (2022-05-16T07:53:42Z) - Relating by Contrasting: A Data-efficient Framework for Multimodal
Generative Models [86.9292779620645]
We develop a contrastive framework for generative model learning, allowing us to train the model not just by the commonality between modalities, but by the distinction between "related" and "unrelated" multimodal data.
Under our proposed framework, the generative model can accurately identify related samples from unrelated ones, making it possible to make use of the plentiful unlabeled, unpaired multimodal data.
arXiv Detail & Related papers (2020-07-02T15:08:11Z) - Variational Hyper RNN for Sequence Modeling [69.0659591456772]
We propose a novel probabilistic sequence model that excels at capturing high variability in time series data.
Our method uses temporal latent variables to capture information about the underlying data pattern.
The efficacy of the proposed method is demonstrated on a range of synthetic and real-world sequential data.
arXiv Detail & Related papers (2020-02-24T19:30:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.