A Causally Informed Pretraining Approach for Multimodal Foundation Models: Applications in Remote Sensing
- URL: http://arxiv.org/abs/2407.19660v3
- Date: Tue, 18 Feb 2025 03:39:37 GMT
- Title: A Causally Informed Pretraining Approach for Multimodal Foundation Models: Applications in Remote Sensing
- Authors: Praveen Ravirathinam, Ankush Khandelwal, Rahul Ghosh, Vipin Kumar,
- Abstract summary: Self-supervised learning has emerged as a powerful paradigm for pretraining foundation models using large-scale data.<n>We propose Causally Informed Variable-Step Forecasting (CI-VSF), a novel pretraining task that models forecasting as a conditional generation task.<n>We demonstrate that pretraining in such a fashion leads to enhanced performance when finetuned on both prediction and forecasting.
- Score: 16.824262496666893
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-supervised learning has emerged as a powerful paradigm for pretraining foundation models using large-scale data. Existing pretraining approaches predominantly rely on masked reconstruction or next-token prediction strategies, demonstrating strong performance across various downstream tasks, including geoscience applications. However, these approaches do not fully capture the causal interplay between different geospatial and environmental variables. To address this limitation, we propose Causally Informed Variable-Step Forecasting (CI-VSF), a novel pretraining task that models forecasting as a conditional generation task, where driver variables (e.g., weather) inform the prediction of response variables (e.g., satellite imagery). We demonstrate that pretraining in such a fashion leads to enhanced performance when finetuned on both prediction (e.g., crop mapping, missing image prediction, soil moisture estimation) and forecasting (e.g., future image forecasting, soil moisture forecasting) downstream tasks when compared to other pretraining approaches. While we use remote sensing as our main application to demonstrate the efficacy of our proposed pretraining strategy over existing paradigms, it is applicable to any domain that involves known causal relationships amongst a set of variables.
Related papers
- Goal-Oriented Time-Series Forecasting: Foundation Framework Design [11.999600538978044]
Time-series forecasting often focuses only on minimizing prediction errors, ignoring the specific requirements of real-world applications.
This paper presents a new training methodology, which allows a forecasting model to dynamically adjust its focus based on the importance of forecast ranges specified by the end application.
arXiv Detail & Related papers (2025-04-24T12:34:43Z) - On conditional diffusion models for PDE simulations [53.01911265639582]
We study score-based diffusion models for forecasting and assimilation of sparse observations.
We propose an autoregressive sampling approach that significantly improves performance in forecasting.
We also propose a new training strategy for conditional score-based models that achieves stable performance over a range of history lengths.
arXiv Detail & Related papers (2024-10-21T18:31:04Z) - Improving satellite imagery segmentation using multiple Sentinel-2 revisits [0.0]
We explore the best way to use revisits in the framework of fine-tuning pre-trained remote sensing models.
We find that fusing representations from multiple revisits in the model latent space is superior to other methods of using revisits.
A SWIN Transformer-based architecture performs better than U-nets and ViT-based models.
arXiv Detail & Related papers (2024-09-25T21:13:33Z) - Motion Forecasting via Model-Based Risk Minimization [8.766024024417316]
We propose a novel sampling method applicable to trajectory prediction based on the predictions of multiple models.
We first show that conventional sampling based on predicted probabilities can degrade performance due to missing alignment between models.
By using state-of-the-art models as base learners, our approach constructs diverse and effective ensembles for optimal trajectory sampling.
arXiv Detail & Related papers (2024-09-16T09:03:28Z) - MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining [73.81862342673894]
Foundation models have reshaped the landscape of Remote Sensing (RS) by enhancing various image interpretation tasks.
transferring the pretrained models to downstream tasks may encounter task discrepancy due to their formulation of pretraining as image classification or object discrimination tasks.
We conduct multi-task supervised pretraining on the SAMRS dataset, encompassing semantic segmentation, instance segmentation, and rotated object detection.
Our models are finetuned on various RS downstream tasks, such as scene classification, horizontal and rotated object detection, semantic segmentation, and change detection.
arXiv Detail & Related papers (2024-03-20T09:17:22Z) - PROMPT-IML: Image Manipulation Localization with Pre-trained Foundation
Models Through Prompt Tuning [35.39822183728463]
We present a novel Prompt-IML framework for detecting tampered images.
Humans tend to discern authenticity of an image based on semantic and high-frequency information.
Our model can achieve better performance on eight typical fake image datasets.
arXiv Detail & Related papers (2024-01-01T03:45:07Z) - DiAD: A Diffusion-based Framework for Multi-class Anomaly Detection [55.48770333927732]
We propose a Difusion-based Anomaly Detection (DiAD) framework for multi-class anomaly detection.
It consists of a pixel-space autoencoder, a latent-space Semantic-Guided (SG) network with a connection to the stable diffusion's denoising network, and a feature-space pre-trained feature extractor.
Experiments on MVTec-AD and VisA datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-12-11T18:38:28Z) - PreDiff: Precipitation Nowcasting with Latent Diffusion Models [28.52267957954304]
We develop a conditional latent diffusion model capable of probabilistic forecasts.
We incorporate an explicit knowledge alignment mechanism to align forecasts with domain-specific physical constraints.
arXiv Detail & Related papers (2023-07-19T19:19:13Z) - Towards Motion Forecasting with Real-World Perception Inputs: Are
End-to-End Approaches Competitive? [93.10694819127608]
We propose a unified evaluation pipeline for forecasting methods with real-world perception inputs.
Our in-depth study uncovers a substantial performance gap when transitioning from curated to perception-based data.
arXiv Detail & Related papers (2023-06-15T17:03:14Z) - Exploring the Application of Large-scale Pre-trained Models on Adverse
Weather Removal [97.53040662243768]
We propose a CLIP embedding module to make the network handle different weather conditions adaptively.
This module integrates the sample specific weather prior extracted by CLIP image encoder together with the distribution specific information learned by a set of parameters.
arXiv Detail & Related papers (2023-06-15T10:06:13Z) - Inverse Dynamics Pretraining Learns Good Representations for Multitask
Imitation [66.86987509942607]
We evaluate how such a paradigm should be done in imitation learning.
We consider a setting where the pretraining corpus consists of multitask demonstrations.
We argue that inverse dynamics modeling is well-suited to this setting.
arXiv Detail & Related papers (2023-05-26T14:40:46Z) - Multi-modal learning for geospatial vegetation forecasting [1.8180482634934092]
We introduce GreenEarthNet, the first dataset specifically designed for high-resolution vegetation forecasting.
We also present Contextformer, a novel deep learning approach for predicting vegetation greenness from Sentinel 2 satellite images.
To the best of our knowledge, this work presents the first models for continental-scale vegetation modeling at fine resolution able to capture anomalies beyond the seasonal cycle.
arXiv Detail & Related papers (2023-03-28T17:59:05Z) - Towards Out-of-Distribution Sequential Event Prediction: A Causal
Treatment [72.50906475214457]
The goal of sequential event prediction is to estimate the next event based on a sequence of historical events.
In practice, the next-event prediction models are trained with sequential data collected at one time.
We propose a framework with hierarchical branching structures for learning context-specific representations.
arXiv Detail & Related papers (2022-10-24T07:54:13Z) - RelPose: Predicting Probabilistic Relative Rotation for Single Objects
in the Wild [73.1276968007689]
We describe a data-driven method for inferring the camera viewpoints given multiple images of an arbitrary object.
We show that our approach outperforms state-of-the-art SfM and SLAM methods given sparse images on both seen and unseen categories.
arXiv Detail & Related papers (2022-08-11T17:59:59Z) - SatMAE: Pre-training Transformers for Temporal and Multi-Spectral
Satellite Imagery [74.82821342249039]
We present SatMAE, a pre-training framework for temporal or multi-spectral satellite imagery based on Masked Autoencoder (MAE)
To leverage temporal information, we include a temporal embedding along with independently masking image patches across time.
arXiv Detail & Related papers (2022-07-17T01:35:29Z) - Conditioned Human Trajectory Prediction using Iterative Attention Blocks [70.36888514074022]
We present a simple yet effective pedestrian trajectory prediction model aimed at pedestrians positions prediction in urban-like environments.
Our model is a neural-based architecture that can run several layers of attention blocks and transformers in an iterative sequential fashion.
We show that without explicit introduction of social masks, dynamical models, social pooling layers, or complicated graph-like structures, it is possible to produce on par results with SoTA models.
arXiv Detail & Related papers (2022-06-29T07:49:48Z) - A Trainable Spectral-Spatial Sparse Coding Model for Hyperspectral Image
Restoration [36.525810477650026]
Hyperspectral imaging offers new perspectives for diverse applications.
The lack of accurate ground-truth "clean" hyperspectral signals on the spot makes restoration tasks challenging.
In this paper, we advocate for a hybrid approach based on sparse coding principles.
arXiv Detail & Related papers (2021-11-18T14:16:04Z) - Contrastive Multiview Coding with Electro-optics for SAR Semantic
Segmentation [0.6445605125467573]
We propose multi-modal representation learning for SAR semantic segmentation.
Unlike previous studies, our method jointly uses EO imagery, SAR imagery, and a label mask.
Several experiments show that our approach is superior to the existing methods in model performance, sample efficiency, and convergence speed.
arXiv Detail & Related papers (2021-08-31T23:55:41Z) - RAIN: Reinforced Hybrid Attention Inference Network for Motion
Forecasting [34.54878390622877]
We propose a generic motion forecasting framework with dynamic key information selection and ranking based on a hybrid attention mechanism.
The framework is instantiated to handle multi-agent trajectory prediction and human motion forecasting tasks.
We validate the framework on both synthetic simulations and motion forecasting benchmarks in different domains.
arXiv Detail & Related papers (2021-08-03T06:30:30Z) - Enhancing Photorealism Enhancement [83.88433283714461]
We present an approach to enhancing the realism of synthetic images using a convolutional network.
We analyze scene layout distributions in commonly used datasets and find that they differ in important ways.
We report substantial gains in stability and realism in comparison to recent image-to-image translation methods.
arXiv Detail & Related papers (2021-05-10T19:00:49Z) - SMART: Simultaneous Multi-Agent Recurrent Trajectory Prediction [72.37440317774556]
We propose advances that address two key challenges in future trajectory prediction.
multimodality in both training data and predictions and constant time inference regardless of number of agents.
arXiv Detail & Related papers (2020-07-26T08:17:10Z) - Bridging the Gap Between Training and Inference for Spatio-Temporal
Forecasting [16.06369357595426]
We propose a novel curriculum learning based strategy named Temporal Progressive Growing Sampling to bridge the gap between training and inference for S-temporal sequence forecasting.
Experimental results demonstrate that our proposed method better models long term dependencies and outperforms baseline approaches on two competitive datasets.
arXiv Detail & Related papers (2020-05-19T10:14:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.