FOCAL: Contrastive Learning for Multimodal Time-Series Sensing Signals
in Factorized Orthogonal Latent Space
- URL: http://arxiv.org/abs/2310.20071v1
- Date: Mon, 30 Oct 2023 22:55:29 GMT
- Title: FOCAL: Contrastive Learning for Multimodal Time-Series Sensing Signals
in Factorized Orthogonal Latent Space
- Authors: Shengzhong Liu, Tomoyoshi Kimura, Dongxin Liu, Ruijie Wang, Jinyang
Li, Suhas Diggavi, Mani Srivastava, Tarek Abdelzaher
- Abstract summary: This paper proposes a novel contrastive learning framework, called FOCAL, for extracting comprehensive features from multimodal time-series sensing signals.
It consistently outperforms the state-of-the-art baselines in downstream tasks with a clear margin.
- Score: 7.324708513042455
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper proposes a novel contrastive learning framework, called FOCAL, for
extracting comprehensive features from multimodal time-series sensing signals
through self-supervised training. Existing multimodal contrastive frameworks
mostly rely on the shared information between sensory modalities, but do not
explicitly consider the exclusive modality information that could be critical
to understanding the underlying sensing physics. Besides, contrastive
frameworks for time series have not handled the temporal information locality
appropriately. FOCAL solves these challenges by making the following
contributions: First, given multimodal time series, it encodes each modality
into a factorized latent space consisting of shared features and private
features that are orthogonal to each other. The shared space emphasizes feature
patterns consistent across sensory modalities through a modal-matching
objective. In contrast, the private space extracts modality-exclusive
information through a transformation-invariant objective. Second, we propose a
temporal structural constraint for modality features, such that the average
distance between temporally neighboring samples is no larger than that of
temporally distant samples. Extensive evaluations are performed on four
multimodal sensing datasets with two backbone encoders and two classifiers to
demonstrate the superiority of FOCAL. It consistently outperforms the
state-of-the-art baselines in downstream tasks with a clear margin, under
different ratios of available labels. The code and self-collected dataset are
available at https://github.com/tomoyoshki/focal.
Related papers
- GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric Learning [51.677086019209554]
We propose a Generalized Structural Sparse to capture powerful relationships across modalities for pair-wise similarity learning.
The distance metric delicately encapsulates two formats of diagonal and block-diagonal terms.
Experiments on cross-modal and two extra uni-modal retrieval tasks have validated its superiority and flexibility.
arXiv Detail & Related papers (2024-10-20T03:45:50Z) - Semantic-Guided Multimodal Sentiment Decoding with Adversarial Temporal-Invariant Learning [22.54577327204281]
Multimodal sentiment analysis aims to learn representations from different modalities to identify human emotions.
Existing works often neglect the frame-level redundancy inherent in continuous time series, resulting in incomplete modality representations with noise.
We propose temporal-invariant learning for the first time, which constrains the distributional variations over time steps to effectively capture long-term temporal dynamics.
arXiv Detail & Related papers (2024-08-30T03:28:40Z) - Multimodal generative semantic communication based on latent diffusion model [13.035207938169844]
This paper introduces a multimodal generative semantic communication framework named mm-GESCO.
The framework ingests streams of visible and infrared modal image data, generates fused semantic segmentation maps, and transmits them.
At the receiving end, the framework can reconstruct the original multimodal images based on the semantic maps.
arXiv Detail & Related papers (2024-08-10T06:23:41Z) - Leveraging 2D Information for Long-term Time Series Forecasting with Vanilla Transformers [55.475142494272724]
Time series prediction is crucial for understanding and forecasting complex dynamics in various domains.
We introduce GridTST, a model that combines the benefits of two approaches using innovative multi-directional attentions.
The model consistently delivers state-of-the-art performance across various real-world datasets.
arXiv Detail & Related papers (2024-05-22T16:41:21Z) - Spatial-Temporal Decoupling Contrastive Learning for Skeleton-based
Human Action Recognition [10.403751563214113]
STD-CL is a framework to obtain discriminative and semantically distinct representations from the sequences.
STD-CL achieves solid improvements on NTU60, NTU120, and NW-UCLA benchmarks.
arXiv Detail & Related papers (2023-12-23T02:54:41Z) - Graph-Aware Contrasting for Multivariate Time-Series Classification [50.84488941336865]
Existing contrastive learning methods mainly focus on achieving temporal consistency with temporal augmentation and contrasting techniques.
We propose Graph-Aware Contrasting for spatial consistency across MTS data.
Our proposed method achieves state-of-the-art performance on various MTS classification tasks.
arXiv Detail & Related papers (2023-09-11T02:35:22Z) - Object Segmentation by Mining Cross-Modal Semantics [68.88086621181628]
We propose a novel approach by mining the Cross-Modal Semantics to guide the fusion and decoding of multimodal features.
Specifically, we propose a novel network, termed XMSNet, consisting of (1) all-round attentive fusion (AF), (2) coarse-to-fine decoder (CFD), and (3) cross-layer self-supervision.
arXiv Detail & Related papers (2023-05-17T14:30:11Z) - Robust Detection of Lead-Lag Relationships in Lagged Multi-Factor Models [61.10851158749843]
Key insights can be obtained by discovering lead-lag relationships inherent in the data.
We develop a clustering-driven methodology for robust detection of lead-lag relationships in lagged multi-factor models.
arXiv Detail & Related papers (2023-05-11T10:30:35Z) - Learning Appearance-motion Normality for Video Anomaly Detection [11.658792932975652]
We propose spatial-temporal memories augmented two-stream auto-encoder framework.
It learns the appearance normality and motion normality independently and explores the correlations via adversarial learning.
Our framework outperforms the state-of-the-art methods, achieving AUCs of 98.1% and 89.8% on UCSD Ped2 and CUHK Avenue datasets.
arXiv Detail & Related papers (2022-07-27T08:30:19Z) - Unsupervised Representation Learning for Time Series with Temporal
Neighborhood Coding [8.45908939323268]
We propose a self-supervised framework for learning generalizable representations for non-stationary time series.
Our motivation stems from the medical field, where the ability to model the dynamic nature of time series data is especially valuable.
arXiv Detail & Related papers (2021-06-01T19:53:24Z) - Connecting the Dots: Multivariate Time Series Forecasting with Graph
Neural Networks [91.65637773358347]
We propose a general graph neural network framework designed specifically for multivariate time series data.
Our approach automatically extracts the uni-directed relations among variables through a graph learning module.
Our proposed model outperforms the state-of-the-art baseline methods on 3 of 4 benchmark datasets.
arXiv Detail & Related papers (2020-05-24T04:02:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.