Revisiting the Encoding of Satellite Image Time Series
- URL: http://arxiv.org/abs/2305.02086v2
- Date: Fri, 8 Sep 2023 11:58:41 GMT
- Title: Revisiting the Encoding of Satellite Image Time Series
- Authors: Xin Cai, Yaxin Bi, Peter Nicholl, and Roy Sterritt
- Abstract summary: Image Time Series (SITS)temporal learning is complex due to hightemporal resolutions and irregular acquisition times.
We develop a novel perspective of SITS processing as a direct set prediction problem, inspired by the recent trend in adopting query-based transformer decoders.
We attain new state-of-the-art (SOTA) results on the Satellite PASTIS benchmark dataset.
- Score: 2.5874041837241304
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Satellite Image Time Series (SITS) representation learning is complex due to
high spatiotemporal resolutions, irregular acquisition times, and intricate
spatiotemporal interactions. These challenges result in specialized neural
network architectures tailored for SITS analysis. The field has witnessed
promising results achieved by pioneering researchers, but transferring the
latest advances or established paradigms from Computer Vision (CV) to SITS is
still highly challenging due to the existing suboptimal representation learning
framework. In this paper, we develop a novel perspective of SITS processing as
a direct set prediction problem, inspired by the recent trend in adopting
query-based transformer decoders to streamline the object detection or image
segmentation pipeline. We further propose to decompose the representation
learning process of SITS into three explicit steps: collect-update-distribute,
which is computationally efficient and suits for irregularly-sampled and
asynchronous temporal satellite observations. Facilitated by the unique
reformulation, our proposed temporal learning backbone of SITS, initially
pre-trained on the resource efficient pixel-set format and then fine-tuned on
the downstream dense prediction tasks, has attained new state-of-the-art (SOTA)
results on the PASTIS benchmark dataset. Specifically, the clear separation
between temporal and spatial components in the semantic/panoptic segmentation
pipeline of SITS makes us leverage the latest advances in CV, such as the
universal image segmentation architecture, resulting in a noticeable 2.5 points
increase in mIoU and 8.8 points increase in PQ, respectively, compared to the
best scores reported so far.
Related papers
- Paving the way toward foundation models for irregular and unaligned Satellite Image Time Series [0.0]
We propose an ALIgned Sits (ALISE) to take into account the spatial, spectral, and temporal dimensions of satellite imagery.
Unlike SSL models currently available for SITS, ALISE incorporates a flexible query mechanism to project the SITS into a common and learned temporal projection space.
The quality of the produced representation is assessed through three downstream tasks: crop segmentation (PASTIS), land cover segmentation (MultiSenGE) and a novel crop change detection dataset.
arXiv Detail & Related papers (2024-07-11T12:42:10Z) - S^2Former-OR: Single-Stage Bi-Modal Transformer for Scene Graph Generation in OR [50.435592120607815]
Scene graph generation (SGG) of surgical procedures is crucial in enhancing holistically cognitive intelligence in the operating room (OR)
Previous works have primarily relied on multi-stage learning, where the generated semantic scene graphs depend on intermediate processes with pose estimation and object detection.
In this study, we introduce a novel single-stage bi-modal transformer framework for SGG in the OR, termed S2Former-OR.
arXiv Detail & Related papers (2024-02-22T11:40:49Z) - Distillation Enhanced Time Series Forecasting Network with Momentum Contrastive Learning [7.4106801792345705]
We propose DE-TSMCL, an innovative distillation enhanced framework for long sequence time series forecasting.
Specifically, we design a learnable data augmentation mechanism which adaptively learns whether to mask a timestamp.
Then, we propose a contrastive learning task with momentum update to explore inter-sample and intra-temporal correlations of time series.
By developing model loss from multiple tasks, we can learn effective representations for downstream forecasting task.
arXiv Detail & Related papers (2024-01-31T12:52:10Z) - ViTs for SITS: Vision Transformers for Satellite Image Time Series [52.012084080257544]
We introduce a fully-attentional model for general Satellite Image Time Series (SITS) processing based on the Vision Transformer (ViT)
TSViT splits a SITS record into non-overlapping patches in space and time which are tokenized and subsequently processed by a factorized temporo-spatial encoder.
arXiv Detail & Related papers (2023-01-12T11:33:07Z) - SatMAE: Pre-training Transformers for Temporal and Multi-Spectral
Satellite Imagery [74.82821342249039]
We present SatMAE, a pre-training framework for temporal or multi-spectral satellite imagery based on Masked Autoencoder (MAE)
To leverage temporal information, we include a temporal embedding along with independently masking image patches across time.
arXiv Detail & Related papers (2022-07-17T01:35:29Z) - STIP: A SpatioTemporal Information-Preserving and Perception-Augmented
Model for High-Resolution Video Prediction [78.129039340528]
We propose a Stemporal Information-Preserving and Perception-Augmented Model (STIP) to solve the above two problems.
The proposed model aims to preserve thetemporal information for videos during the feature extraction and the state transitions.
Experimental results show that the proposed STIP can predict videos with more satisfactory visual quality compared with a variety of state-of-the-art methods.
arXiv Detail & Related papers (2022-06-09T09:49:04Z) - Investigating Temporal Convolutional Neural Networks for Satellite Image
Time Series Classification: A survey [0.0]
Temporal CNNs have been employed for SITS classification tasks with encouraging results.
This paper seeks to survey this method against a plethora of other contemporary methods for SITS classification to validate the existing findings in recent literature.
Experiments are carried out on two benchmark SITS datasets with the results demonstrating that Temporal CNNs display a superior performance to the comparative benchmark algorithms.
arXiv Detail & Related papers (2022-04-13T14:08:14Z) - PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive
Learning [109.84770951839289]
We present PredRNN, a new recurrent network for learning visual dynamics from historical context.
We show that our approach obtains highly competitive results on three standard datasets.
arXiv Detail & Related papers (2021-03-17T08:28:30Z) - Unsupervised Monocular Depth Learning with Integrated Intrinsics and
Spatio-Temporal Constraints [61.46323213702369]
This work presents an unsupervised learning framework that is able to predict at-scale depth maps and egomotion.
Our results demonstrate strong performance when compared to the current state-of-the-art on multiple sequences of the KITTI driving dataset.
arXiv Detail & Related papers (2020-11-02T22:26:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.