Related papers: Quantizing Space and Time: Fusing Time Series and Images for Earth Observation

Quantizing Space and Time: Fusing Time Series and Images for Earth Observation

URL: http://arxiv.org/abs/2510.23118v3
Date: Wed, 29 Oct 2025 15:24:05 GMT
Title: Quantizing Space and Time: Fusing Time Series and Images for Earth Observation
Authors: Gianfranco Basile, Johannes Jakubik, Benedikt Blumenstiel, Thomas Brunschwiler, Juan Bernabe Moreno,
Abstract summary: We propose a task-agnostic framework for multimodal fusion of time series and single timestamp images.<n>Our approach explores deterministic and learned strategies for time series quantization.<n>Our model generates consistent global temperature profiles from satellite imagery.
Score: 4.012968772806928
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a task-agnostic framework for multimodal fusion of time series and single timestamp images, enabling cross-modal generation and robust downstream performance. Our approach explores deterministic and learned strategies for time series quantization and then leverages a masked correlation learning objective, aligning discrete image and time series tokens in a unified representation space. Instantiated in the Earth observation domain, the pretrained model generates consistent global temperature profiles from satellite imagery and is validated through counterfactual experiments. Across downstream tasks, our task-agnostic pretraining outperforms task-specific fusion by 6% in R^2 and 2% in RMSE on average, and exceeds baseline methods by 50% in R^2 and 12% in RMSE. Finally, we analyze gradient sensitivity across modalities, providing insights into model robustness. Code, data, and weights will be released under a permissive license.

Related papers

Vision-LLMs for Spatiotemporal Traffic Forecasting [14.700408329373998]
Large Language Models (LLMs) inherently struggle to model the complex spatial dependencies of grid-based traffic data.<n>We propose ST-Vision-LLM, a novel framework reframe thatstemporal forecasting as a vision-language fusion problem.<n>We show that ST-Vision-LLM outperforms existing methods by 15.6% in long-term prediction accuracy and exceeds the second-best baseline by over 30.04% in cross-domain scenarios.
arXiv Detail & Related papers (2025-10-13T11:15:56Z)
Towards Anytime Retrieval: A Benchmark for Anytime Person Re-Identification [85.78039373517021]
Anytime Person Re-identification (AT-ReID) aims to achieve effective retrieval in multiple scenarios based on variations in time.<n>We collect the first large-scale dataset, AT-USTC, which contains 403k images of individuals wearing multiple clothes.<n>We propose a unified model named Uni-AT, which comprises a multi-scenario ReID framework for scenario-specific features learning.
arXiv Detail & Related papers (2025-09-20T11:20:22Z)
Leveraging Intermediate Representations of Time Series Foundation Models for Anomaly Detection [0.0]
Time series foundation models (TSFMs) have emerged as a powerful tool for anomaly detection.<n>We propose TimeRep, a novel anomaly detection approach that leverages the intermediate layer's representations of TSFMs.<n>TimeRep consistently outperforms a broad spectrum of state-of-the-art baselines.
arXiv Detail & Related papers (2025-09-16T04:10:17Z)
MSSDF: Modality-Shared Self-supervised Distillation for High-Resolution Multi-modal Remote Sensing Image Learning [25.381211868583826]
We propose a multi-modal self-supervised learning framework that leverages high-resolution RGB images, multi-spectral data, and digital surface models (DSM) for pre-training.<n>We evaluate the proposed method on multiple downstream tasks, covering typical remote sensing applications such as scene classification, semantic segmentation, change detection, object detection, and depth estimation.
arXiv Detail & Related papers (2025-06-11T02:01:36Z)
TimeLDM: Latent Diffusion Model for Unconditional Time Series Generation [2.4454605633840143]
Time series generation is a crucial research topic in the area of decision-making systems. Recent approaches focus on learning in the data space to model time series information. We propose TimeLDM, a novel latent diffusion model for high-quality time series generation.
arXiv Detail & Related papers (2024-07-05T01:47:20Z)
TSCMamba: Mamba Meets Multi-View Learning for Time Series Classification [13.110156202816112]
We propose a novel multi-view approach to capture patterns with properties like shift equivariance.<n>Our method integrates diverse features, including spectral, temporal, local, and global features, to obtain rich, complementary contexts for TSC.<n>Our approach achieves average accuracy improvements of 4.01-6.45% and 7.93% respectively, over leading TSC models.
arXiv Detail & Related papers (2024-06-06T18:05:10Z)
PeFAD: A Parameter-Efficient Federated Framework for Time Series Anomaly Detection [51.20479454379662]
We propose a. Federated Anomaly Detection framework named PeFAD with the increasing privacy concerns. We conduct extensive evaluations on four real datasets, where PeFAD outperforms existing state-of-the-art baselines by up to 28.74%.
arXiv Detail & Related papers (2024-06-04T13:51:08Z)
Graph Spatiotemporal Process for Multivariate Time Series Anomaly Detection with Missing Values [67.76168547245237]
We introduce a novel framework called GST-Pro, which utilizes a graphtemporal process and anomaly scorer to detect anomalies. Our experimental results show that the GST-Pro method can effectively detect anomalies in time series data and outperforms state-of-the-art methods.
arXiv Detail & Related papers (2024-01-11T10:10:16Z)
Robust Detection of Lead-Lag Relationships in Lagged Multi-Factor Models [61.10851158749843]
Key insights can be obtained by discovering lead-lag relationships inherent in the data. We develop a clustering-driven methodology for robust detection of lead-lag relationships in lagged multi-factor models.
arXiv Detail & Related papers (2023-05-11T10:30:35Z)
Multi-Level Contrastive Learning for Dense Prediction Task [59.591755258395594]
We present Multi-Level Contrastive Learning for Dense Prediction Task (MCL), an efficient self-supervised method for learning region-level feature representation for dense prediction tasks. Our method is motivated by the three key factors in detection: localization, scale consistency and recognition. Our method consistently outperforms the recent state-of-the-art methods on various datasets with significant margins.
arXiv Detail & Related papers (2023-04-04T17:59:04Z)
An Unsupervised Short- and Long-Term Mask Representation for Multivariate Time Series Anomaly Detection [2.387411589813086]
This paper proposes an anomaly detection method based on unsupervised Short- and Long-term Mask Representation learning (SLMR) Experiments show that the performance of our method outperforms other state-of-the-art models on three real-world datasets.
arXiv Detail & Related papers (2022-08-19T09:34:11Z)
SatMAE: Pre-training Transformers for Temporal and Multi-Spectral Satellite Imagery [74.82821342249039]
We present SatMAE, a pre-training framework for temporal or multi-spectral satellite imagery based on Masked Autoencoder (MAE) To leverage temporal information, we include a temporal embedding along with independently masking image patches across time.
arXiv Detail & Related papers (2022-07-17T01:35:29Z)
Time Series Anomaly Detection by Cumulative Radon Features [32.36217153362305]
In this work, we argue that shallow features suffice when combined with distribution distance measures. Our approach models each time series as a high dimensional empirical distribution of features, where each time-point constitutes a single sample. We show that by parameterizing each time series using cumulative Radon features, we are able to efficiently and effectively model the distribution of normal time series.
arXiv Detail & Related papers (2022-02-08T18:58:53Z)
Semi-supervised Facial Action Unit Intensity Estimation with Contrastive Learning [54.90704746573636]
Our method does not require to manually select key frames, and produces state-of-the-art results with as little as $2%$ of annotated frames. We experimentally validate that our method outperforms existing methods when working with as little as $2%$ of randomly chosen data.
arXiv Detail & Related papers (2020-11-03T17:35:57Z)
A Multi-Channel Neural Graphical Event Model with Negative Evidence [76.51278722190607]
Event datasets are sequences of events of various types occurring irregularly over the time-line. We propose a non-parametric deep neural network approach in order to estimate the underlying intensity functions.
arXiv Detail & Related papers (2020-02-21T23:10:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.