Towards Temporal Change Explanations from Bi-Temporal Satellite Images
- URL: http://arxiv.org/abs/2407.09548v1
- Date: Thu, 27 Jun 2024 12:49:22 GMT
- Title: Towards Temporal Change Explanations from Bi-Temporal Satellite Images
- Authors: Ryo Tsujimoto, Hiroki Ouchi, Hidetaka Kamigaito, Taro Watanabe,
- Abstract summary: We investigate the ability of Large-scale Vision-Language Models to explain temporal changes between satellite images.
We propose three prompting methods to deal with a par of satellite images as input.
Through human evaluation, we found the effectiveness of our step-by-step reasoning based prompting.
- Score: 28.445851360368803
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Explaining temporal changes between satellite images taken at different times is important for urban planning and environmental monitoring. However, manual dataset construction for the task is costly, so human-AI collaboration is promissing. Toward the direction, in this paper, we investigate the ability of Large-scale Vision-Language Models (LVLMs) to explain temporal changes between satellite images. While LVLMs are known to generate good image captions, they receive only a single image as input. To deal with a par of satellite images as input, we propose three prompting methods. Through human evaluation, we found the effectiveness of our step-by-step reasoning based prompting.
Related papers
- AddressCLIP: Empowering Vision-Language Models for City-wide Image Address Localization [57.34659640776723]
We propose an end-to-end framework named AddressCLIP to solve the problem with more semantics.
We have built three datasets from Pittsburgh and San Francisco on different scales specifically for the IAL problem.
arXiv Detail & Related papers (2024-07-11T03:18:53Z) - Continuous Urban Change Detection from Satellite Image Time Series with Temporal Feature Refinement and Multi-Task Integration [5.095834019284525]
Urbanization advances at unprecedented rates, resulting in negative effects on the environment and human well-being.
Deep learning-based methods have achieved promising urban change detection results from optical satellite image pairs.
We propose a continuous urban change detection method that identifies changes in each consecutive image pair of a satellite image time series.
arXiv Detail & Related papers (2024-06-25T10:53:57Z) - SatDiffMoE: A Mixture of Estimation Method for Satellite Image Super-resolution with Latent Diffusion Models [3.839322642354617]
We propose a novel diffusion-based fusion algorithm called textbfSatDiffMoE.
Our algorithm is highly flexible and allows training and inference on arbitrary number of low-resolution images.
Experimental results show that our proposed SatDiffMoE method achieves superior performance for the satellite image super-resolution tasks.
arXiv Detail & Related papers (2024-06-14T17:58:28Z) - Self-Explainable Affordance Learning with Embodied Caption [63.88435741872204]
We introduce Self-Explainable Affordance learning (SEA) with embodied caption.
SEA enables robots to articulate their intentions and bridge the gap between explainable vision-language caption and visual affordance learning.
We propose a novel model to effectively combine affordance grounding with self-explanation in a simple but efficient manner.
arXiv Detail & Related papers (2024-04-08T15:22:38Z) - SatSynth: Augmenting Image-Mask Pairs through Diffusion Models for Aerial Semantic Segmentation [69.42764583465508]
We explore the potential of generative image diffusion to address the scarcity of annotated data in earth observation tasks.
To the best of our knowledge, we are the first to generate both images and corresponding masks for satellite segmentation.
arXiv Detail & Related papers (2024-03-25T10:30:22Z) - Self-Supervision in Time for Satellite Images(S3-TSS): A novel method of
SSL technique in Satellite images [0.38366697175402226]
We propose S3-TSS, a novel method of self-supervised learning technique that leverages natural augmentation occurring in temporal dimension.
Our method was able to perform better than baseline SeCo in four downstream datasets.
arXiv Detail & Related papers (2024-03-07T19:16:17Z) - VELMA: Verbalization Embodiment of LLM Agents for Vision and Language
Navigation in Street View [81.58612867186633]
Vision and Language Navigation(VLN) requires visual and natural language understanding as well as spatial and temporal reasoning capabilities.
We show that VELMA is able to successfully follow navigation instructions in Street View with only two in-context examples.
We further finetune the LLM agent on a few thousand examples and achieve 25%-30% relative improvement in task completion over the previous state-of-the-art for two datasets.
arXiv Detail & Related papers (2023-07-12T11:08:24Z) - Unsupervised Discovery of Semantic Concepts in Satellite Imagery with
Style-based Wavelet-driven Generative Models [27.62417543307831]
We present the first pre-trained style- and wavelet-based GAN model that can synthesize a wide gamut of realistic satellite images.
We show that by analyzing the intermediate activations of our network, one can discover a multitude of interpretable semantic directions.
arXiv Detail & Related papers (2022-08-03T14:19:24Z) - SatMAE: Pre-training Transformers for Temporal and Multi-Spectral
Satellite Imagery [74.82821342249039]
We present SatMAE, a pre-training framework for temporal or multi-spectral satellite imagery based on Masked Autoencoder (MAE)
To leverage temporal information, we include a temporal embedding along with independently masking image patches across time.
arXiv Detail & Related papers (2022-07-17T01:35:29Z) - Coming Down to Earth: Satellite-to-Street View Synthesis for
Geo-Localization [9.333087475006003]
Cross-view image based geo-localization is notoriously challenging due to drastic viewpoint and appearance differences between the two domains.
We show that we can address this discrepancy explicitly by learning to synthesize realistic street views from satellite inputs.
We propose a novel multi-task architecture in which image synthesis and retrieval are considered jointly.
arXiv Detail & Related papers (2021-03-11T17:40:59Z) - Geometry-Guided Street-View Panorama Synthesis from Satellite Imagery [80.6282101835164]
We present a new approach for synthesizing a novel street-view panorama given an overhead satellite image.
Our method generates a Google's omnidirectional street-view type panorama, as if it is captured from the same geographical location as the center of the satellite patch.
arXiv Detail & Related papers (2021-03-02T10:27:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.