Related papers: Learning Sequential Latent Variable Models from Multimodal Time Series Data

Learning Sequential Latent Variable Models from Multimodal Time Series Data

URL: http://arxiv.org/abs/2204.10419v1
Date: Thu, 21 Apr 2022 21:59:24 GMT
Title: Learning Sequential Latent Variable Models from Multimodal Time Series Data
Authors: Oliver Limoyo, Trevor Ablett, and Jonathan Kelly
Abstract summary: We present a self-supervised generative modelling framework to jointly learn a probabilistic latent state representation of multimodal data. We demonstrate that our approach leads to significant improvements in prediction and representation quality.
Score: 6.107812768939553
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Sequential modelling of high-dimensional data is an important problem that appears in many domains including model-based reinforcement learning and dynamics identification for control. Latent variable models applied to sequential data (i.e., latent dynamics models) have been shown to be a particularly effective probabilistic approach to solve this problem, especially when dealing with images. However, in many application areas (e.g., robotics), information from multiple sensing modalities is available -- existing latent dynamics methods have not yet been extended to effectively make use of such multimodal sequential data. Multimodal sensor streams can be correlated in a useful manner and often contain complementary information across modalities. In this work, we present a self-supervised generative modelling framework to jointly learn a probabilistic latent state representation of multimodal data and the respective dynamics. Using synthetic and real-world datasets from a multimodal robotic planar pushing task, we demonstrate that our approach leads to significant improvements in prediction and representation quality. Furthermore, we compare to the common learning baseline of concatenating each modality in the latent space and show that our principled probabilistic formulation performs better. Finally, despite being fully self-supervised, we demonstrate that our method is nearly as effective as an existing supervised approach that relies on ground truth labels.

Related papers

NExT-OMNI: Towards Any-to-Any Omnimodal Foundation Models with Discrete Flow Matching [64.10695425442164]
We introduce NExT-OMNI, an open-source omnimodal foundation model that achieves unified modeling through discrete flow paradigms.<n>Trained on large-scale interleaved text, image, video, and audio data, NExT-OMNI delivers competitive performance on multimodal generation and understanding benchmarks.<n>To advance further research, we release training details, data protocols, and open-source both the code and model checkpoints.
arXiv Detail & Related papers (2025-10-15T16:25:18Z)
Multimodal-Guided Dynamic Dataset Pruning for Robust and Efficient Data-Centric Learning [49.10890099624699]
We introduce a dynamic dataset pruning framework that adaptively selects training samples based on task-driven difficulty and cross-modality semantic consistency.<n>Our work highlights the potential of integrating cross-modality alignment for robust sample selection, advancing data-centric learning toward more efficient and robust practices across application domains.
arXiv Detail & Related papers (2025-07-17T03:08:26Z)
Continual Multimodal Contrastive Learning [70.60542106731813]
Multimodal contrastive learning (MCL) advances in aligning different modalities and generating multimodal representations in a joint space. However, a critical yet often overlooked challenge remains: multimodal data is rarely collected in a single process, and training from scratch is computationally expensive. In this paper, we formulate CMCL through two specialized principles of stability and plasticity. We theoretically derive a novel optimization-based method, which projects updated gradients from dual sides onto subspaces where any gradient is prevented from interfering with the previously learned knowledge.
arXiv Detail & Related papers (2025-03-19T07:57:08Z)
Learning Multimodal Latent Generative Models with Energy-Based Prior [3.6648642834198797]
We propose a novel framework that integrates the latent generative model with the EBM. This approach results in a more expressive and informative prior, better-capturing of information across multiple modalities.
arXiv Detail & Related papers (2024-09-30T01:38:26Z)
Missing Modality Prediction for Unpaired Multimodal Learning via Joint Embedding of Unimodal Models [6.610033827647869]
In real-world scenarios, consistently acquiring complete multimodal data presents significant challenges. This often leads to the issue of missing modalities, where data for certain modalities are absent. We propose a novel framework integrating parameter-efficient fine-tuning of unimodal pretrained models with a self-supervised joint-embedding learning method.
arXiv Detail & Related papers (2024-07-17T14:44:25Z)
MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild [81.32127423981426]
Multimodal emotion recognition based on audio and video data is important for real-world applications. Recent methods have focused on exploiting advances of self-supervised learning (SSL) for pre-training of strong multimodal encoders. We propose a different perspective on the problem and investigate the advancement of multimodal DFER performance by adapting SSL-pre-trained disjoint unimodal encoders.
arXiv Detail & Related papers (2024-04-13T13:39:26Z)
Beyond DAGs: A Latent Partial Causal Model for Multimodal Learning [80.44084021062105]
We propose a novel latent partial causal model for multimodal data, featuring two latent coupled variables, connected by an undirected edge, to represent the transfer of knowledge across modalities.<n>Under specific statistical assumptions, we establish an identifiability result, demonstrating that representations learned by multimodal contrastive learning correspond to the latent coupled variables up to a trivial transformation.<n>Experiments on a pre-trained CLIP model embodies disentangled representations, enabling few-shot learning and improving domain generalization across diverse real-world datasets.
arXiv Detail & Related papers (2024-02-09T07:18:06Z)
Latent variable model for high-dimensional point process with structured missingness [4.451479907610764]
Longitudinal data are important in numerous fields, such as healthcare, sociology and seismology. Real-world datasets can be high-dimensional, contain structured missingness patterns, and measurement time points can be governed by an unknown process. We propose a flexible and efficient latent-variable model that is capable of addressing all these limitations.
arXiv Detail & Related papers (2024-02-08T15:41:48Z)
Leveraging Diffusion Disentangled Representations to Mitigate Shortcuts in Underspecified Visual Tasks [92.32670915472099]
We propose an ensemble diversification framework exploiting the generation of synthetic counterfactuals using Diffusion Probabilistic Models (DPMs) We show that diffusion-guided diversification can lead models to avert attention from shortcut cues, achieving ensemble diversity performance comparable to previous methods requiring additional data collection.
arXiv Detail & Related papers (2023-10-03T17:37:52Z)
StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized Image-Dialogue Data [129.92449761766025]
We propose a novel data collection methodology that synchronously synthesizes images and dialogues for visual instruction tuning. This approach harnesses the power of generative models, marrying the abilities of ChatGPT and text-to-image generative models. Our research includes comprehensive experiments conducted on various datasets.
arXiv Detail & Related papers (2023-08-20T12:43:52Z)
Learning Latent Dynamics via Invariant Decomposition and (Spatio-)Temporal Transformers [0.6767885381740952]
We propose a method for learning dynamical systems from high-dimensional empirical data. We focus on the setting in which data are available from multiple different instances of a system. We study behaviour through simple theoretical analyses and extensive experiments on synthetic and real-world datasets.
arXiv Detail & Related papers (2023-06-21T07:52:07Z)
Multi-scale Attention Flow for Probabilistic Time Series Forecasting [68.20798558048678]
We propose a novel non-autoregressive deep learning model, called Multi-scale Attention Normalizing Flow(MANF) Our model avoids the influence of cumulative error and does not increase the time complexity. Our model achieves state-of-the-art performance on many popular multivariate datasets.
arXiv Detail & Related papers (2022-05-16T07:53:42Z)
Relating by Contrasting: A Data-efficient Framework for Multimodal Generative Models [86.9292779620645]
We develop a contrastive framework for generative model learning, allowing us to train the model not just by the commonality between modalities, but by the distinction between "related" and "unrelated" multimodal data. Under our proposed framework, the generative model can accurately identify related samples from unrelated ones, making it possible to make use of the plentiful unlabeled, unpaired multimodal data.
arXiv Detail & Related papers (2020-07-02T15:08:11Z)
Variational Hyper RNN for Sequence Modeling [69.0659591456772]
We propose a novel probabilistic sequence model that excels at capturing high variability in time series data. Our method uses temporal latent variables to capture information about the underlying data pattern. The efficacy of the proposed method is demonstrated on a range of synthetic and real-world sequential data.
arXiv Detail & Related papers (2020-02-24T19:30:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.