Related papers: Towards a Knowledge guided Multimodal Foundation Model for Spatio-Temporal Remote Sensing Applications

Towards a Knowledge guided Multimodal Foundation Model for Spatio-Temporal Remote Sensing Applications

URL: http://arxiv.org/abs/2407.19660v2
Date: Wed, 16 Oct 2024 21:18:10 GMT
Title: Towards a Knowledge guided Multimodal Foundation Model for Spatio-Temporal Remote Sensing Applications
Authors: Praveen Ravirathinam, Ankush Khandelwal, Rahul Ghosh, Vipin Kumar,
Abstract summary: We present a foundation model framework, where the pretraining task captures the causal relationship between multiple modalities. Our method, called MultiModal Variable Step Forecasting (MM-VSF), uses forecasting of satellite imagery as a pretraining task.
Score: 16.824262496666893
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In recent years, there has been an increased interest in foundation models for geoscience due to the vast amount of Earth observing satellite imagery. Existing remote sensing foundation models make use of the various sources of spectral imagery to create large models pretrained on the task of masked reconstruction. In this paper, we present a foundation model framework, where the pretraining task captures the causal relationship between multiple modalities. Our framework leverages the knowledge guided principles that the spectral imagery captures the impact of the physical drivers on the environmental system, and that the relationship between them is governed by the characteristics of the system. Specifically, our method, called MultiModal Variable Step Forecasting (MM-VSF), uses forecasting of satellite imagery as a pretraining task and is able to capture the causal relationship between spectral imagery and weather. In our evaluation we show that the forecasting of satellite imagery using weather can be used as an effective pretraining task for foundation models. We further show the effectiveness of the embeddings produced by MM-VSF on the downstream tasks of pixel wise crop mapping and missing image prediction of spectral imagery, when compared with embeddings created by models trained in alternative pretraining settings including the traditional single modality input masked reconstruction.

Related papers

Goal-Oriented Time-Series Forecasting: Foundation Framework Design [11.999600538978044]
Time-series forecasting often focuses only on minimizing prediction errors, ignoring the specific requirements of real-world applications. This paper presents a new training methodology, which allows a forecasting model to dynamically adjust its focus based on the importance of forecast ranges specified by the end application.
arXiv Detail & Related papers (2025-04-24T12:34:43Z)
On conditional diffusion models for PDE simulations [53.01911265639582]
We study score-based diffusion models for forecasting and assimilation of sparse observations. We propose an autoregressive sampling approach that significantly improves performance in forecasting. We also propose a new training strategy for conditional score-based models that achieves stable performance over a range of history lengths.
arXiv Detail & Related papers (2024-10-21T18:31:04Z)
Improving satellite imagery segmentation using multiple Sentinel-2 revisits [0.0]
We explore the best way to use revisits in the framework of fine-tuning pre-trained remote sensing models. We find that fusing representations from multiple revisits in the model latent space is superior to other methods of using revisits. A SWIN Transformer-based architecture performs better than U-nets and ViT-based models.
arXiv Detail & Related papers (2024-09-25T21:13:33Z)
Motion Forecasting via Model-Based Risk Minimization [8.766024024417316]
We propose a novel sampling method applicable to trajectory prediction based on the predictions of multiple models. We first show that conventional sampling based on predicted probabilities can degrade performance due to missing alignment between models. By using state-of-the-art models as base learners, our approach constructs diverse and effective ensembles for optimal trajectory sampling.
arXiv Detail & Related papers (2024-09-16T09:03:28Z)
MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining [73.81862342673894]
Foundation models have reshaped the landscape of Remote Sensing (RS) by enhancing various image interpretation tasks. transferring the pretrained models to downstream tasks may encounter task discrepancy due to their formulation of pretraining as image classification or object discrimination tasks. We conduct multi-task supervised pretraining on the SAMRS dataset, encompassing semantic segmentation, instance segmentation, and rotated object detection. Our models are finetuned on various RS downstream tasks, such as scene classification, horizontal and rotated object detection, semantic segmentation, and change detection.
arXiv Detail & Related papers (2024-03-20T09:17:22Z)
PROMPT-IML: Image Manipulation Localization with Pre-trained Foundation Models Through Prompt Tuning [35.39822183728463]
We present a novel Prompt-IML framework for detecting tampered images. Humans tend to discern authenticity of an image based on semantic and high-frequency information. Our model can achieve better performance on eight typical fake image datasets.
arXiv Detail & Related papers (2024-01-01T03:45:07Z)
DiAD: A Diffusion-based Framework for Multi-class Anomaly Detection [55.48770333927732]
We propose a Difusion-based Anomaly Detection (DiAD) framework for multi-class anomaly detection. It consists of a pixel-space autoencoder, a latent-space Semantic-Guided (SG) network with a connection to the stable diffusion's denoising network, and a feature-space pre-trained feature extractor. Experiments on MVTec-AD and VisA datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-12-11T18:38:28Z)
PreDiff: Precipitation Nowcasting with Latent Diffusion Models [28.52267957954304]
We develop a conditional latent diffusion model capable of probabilistic forecasts. We incorporate an explicit knowledge alignment mechanism to align forecasts with domain-specific physical constraints.
arXiv Detail & Related papers (2023-07-19T19:19:13Z)
Towards Motion Forecasting with Real-World Perception Inputs: Are End-to-End Approaches Competitive? [93.10694819127608]
We propose a unified evaluation pipeline for forecasting methods with real-world perception inputs. Our in-depth study uncovers a substantial performance gap when transitioning from curated to perception-based data.
arXiv Detail & Related papers (2023-06-15T17:03:14Z)
Exploring the Application of Large-scale Pre-trained Models on Adverse Weather Removal [97.53040662243768]
We propose a CLIP embedding module to make the network handle different weather conditions adaptively. This module integrates the sample specific weather prior extracted by CLIP image encoder together with the distribution specific information learned by a set of parameters.
arXiv Detail & Related papers (2023-06-15T10:06:13Z)
Inverse Dynamics Pretraining Learns Good Representations for Multitask Imitation [66.86987509942607]
We evaluate how such a paradigm should be done in imitation learning. We consider a setting where the pretraining corpus consists of multitask demonstrations. We argue that inverse dynamics modeling is well-suited to this setting.
arXiv Detail & Related papers (2023-05-26T14:40:46Z)
Multi-modal learning for geospatial vegetation forecasting [1.8180482634934092]
We introduce GreenEarthNet, the first dataset specifically designed for high-resolution vegetation forecasting. We also present Contextformer, a novel deep learning approach for predicting vegetation greenness from Sentinel 2 satellite images. To the best of our knowledge, this work presents the first models for continental-scale vegetation modeling at fine resolution able to capture anomalies beyond the seasonal cycle.
arXiv Detail & Related papers (2023-03-28T17:59:05Z)
Towards Out-of-Distribution Sequential Event Prediction: A Causal Treatment [72.50906475214457]
The goal of sequential event prediction is to estimate the next event based on a sequence of historical events. In practice, the next-event prediction models are trained with sequential data collected at one time. We propose a framework with hierarchical branching structures for learning context-specific representations.
arXiv Detail & Related papers (2022-10-24T07:54:13Z)
RelPose: Predicting Probabilistic Relative Rotation for Single Objects in the Wild [73.1276968007689]
We describe a data-driven method for inferring the camera viewpoints given multiple images of an arbitrary object. We show that our approach outperforms state-of-the-art SfM and SLAM methods given sparse images on both seen and unseen categories.
arXiv Detail & Related papers (2022-08-11T17:59:59Z)
SatMAE: Pre-training Transformers for Temporal and Multi-Spectral Satellite Imagery [74.82821342249039]
We present SatMAE, a pre-training framework for temporal or multi-spectral satellite imagery based on Masked Autoencoder (MAE) To leverage temporal information, we include a temporal embedding along with independently masking image patches across time.
arXiv Detail & Related papers (2022-07-17T01:35:29Z)
Conditioned Human Trajectory Prediction using Iterative Attention Blocks [70.36888514074022]
We present a simple yet effective pedestrian trajectory prediction model aimed at pedestrians positions prediction in urban-like environments. Our model is a neural-based architecture that can run several layers of attention blocks and transformers in an iterative sequential fashion. We show that without explicit introduction of social masks, dynamical models, social pooling layers, or complicated graph-like structures, it is possible to produce on par results with SoTA models.
arXiv Detail & Related papers (2022-06-29T07:49:48Z)
A Trainable Spectral-Spatial Sparse Coding Model for Hyperspectral Image Restoration [36.525810477650026]
Hyperspectral imaging offers new perspectives for diverse applications. The lack of accurate ground-truth "clean" hyperspectral signals on the spot makes restoration tasks challenging. In this paper, we advocate for a hybrid approach based on sparse coding principles.
arXiv Detail & Related papers (2021-11-18T14:16:04Z)
Contrastive Multiview Coding with Electro-optics for SAR Semantic Segmentation [0.6445605125467573]
We propose multi-modal representation learning for SAR semantic segmentation. Unlike previous studies, our method jointly uses EO imagery, SAR imagery, and a label mask. Several experiments show that our approach is superior to the existing methods in model performance, sample efficiency, and convergence speed.
arXiv Detail & Related papers (2021-08-31T23:55:41Z)
RAIN: Reinforced Hybrid Attention Inference Network for Motion Forecasting [34.54878390622877]
We propose a generic motion forecasting framework with dynamic key information selection and ranking based on a hybrid attention mechanism. The framework is instantiated to handle multi-agent trajectory prediction and human motion forecasting tasks. We validate the framework on both synthetic simulations and motion forecasting benchmarks in different domains.
arXiv Detail & Related papers (2021-08-03T06:30:30Z)
Enhancing Photorealism Enhancement [83.88433283714461]
We present an approach to enhancing the realism of synthetic images using a convolutional network. We analyze scene layout distributions in commonly used datasets and find that they differ in important ways. We report substantial gains in stability and realism in comparison to recent image-to-image translation methods.
arXiv Detail & Related papers (2021-05-10T19:00:49Z)
SMART: Simultaneous Multi-Agent Recurrent Trajectory Prediction [72.37440317774556]
We propose advances that address two key challenges in future trajectory prediction. multimodality in both training data and predictions and constant time inference regardless of number of agents.
arXiv Detail & Related papers (2020-07-26T08:17:10Z)
Bridging the Gap Between Training and Inference for Spatio-Temporal Forecasting [16.06369357595426]
We propose a novel curriculum learning based strategy named Temporal Progressive Growing Sampling to bridge the gap between training and inference for S-temporal sequence forecasting. Experimental results demonstrate that our proposed method better models long term dependencies and outperforms baseline approaches on two competitive datasets.
arXiv Detail & Related papers (2020-05-19T10:14:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.