Related papers: A Semantic Segmentation-guided Approach for Ground-to-Aerial Image Matching

A Semantic Segmentation-guided Approach for Ground-to-Aerial Image Matching

URL: http://arxiv.org/abs/2404.11302v2
Date: Thu, 23 May 2024 11:30:05 GMT
Title: A Semantic Segmentation-guided Approach for Ground-to-Aerial Image Matching
Authors: Francesco Pro, Nikolaos Dionelis, Luca Maiano, Bertrand Le Saux, Irene Amerini,
Abstract summary: This work addresses the problem of matching a query ground-view image with the corresponding satellite image without GPS data. This is done by comparing the features from a ground-view image and a satellite one, innovatively leveraging the corresponding latter's segmentation mask through a three-stream Siamese-like network. The novelty lies in the fusion of satellite images in combination with their semantic segmentation masks, aimed at ensuring that the model can extract useful features and focus on the significant parts of the images.
Score: 30.324252605889356
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Nowadays the accurate geo-localization of ground-view images has an important role across domains as diverse as journalism, forensics analysis, transports, and Earth Observation. This work addresses the problem of matching a query ground-view image with the corresponding satellite image without GPS data. This is done by comparing the features from a ground-view image and a satellite one, innovatively leveraging the corresponding latter's segmentation mask through a three-stream Siamese-like network. The proposed method, Semantic Align Net (SAN), focuses on limited Field-of-View (FoV) and ground panorama images (images with a FoV of 360{\deg}). The novelty lies in the fusion of satellite images in combination with their semantic segmentation masks, aimed at ensuring that the model can extract useful features and focus on the significant parts of the images. This work shows how SAN through semantic analysis of images improves the performance on the unlabelled CVUSA dataset for all the tested FoVs.

Related papers

Enhancing Ground-to-Aerial Image Matching for Visual Misinformation Detection Using Semantic Segmentation [1.9055921262476347]
Recent advancements in generative AI techniques have raised serious concerns about the credibility of digital media available on the Internet. To address these concerns, the ability to geolocate a non-geo-tagged ground-view image without external information, such as GPS coordinates, has become increasingly critical. This study tackles the challenge of linking a ground-view image, potentially exhibiting varying fields of view (FoV), to its corresponding satellite image without the aid of GPS data.
arXiv Detail & Related papers (2025-02-10T09:31:12Z)
Geometry-guided Cross-view Diffusion for One-to-many Cross-view Image Synthesis [48.945931374180795]
This paper presents a novel approach for cross-view synthesis aimed at generating plausible ground-level images from corresponding satellite imagery or vice versa. We refer to these tasks as satellite-to-ground (Sat2Grd) and ground-to-satellite (Grd2Sat) synthesis, respectively.
arXiv Detail & Related papers (2024-12-04T13:47:51Z)
SatSynth: Augmenting Image-Mask Pairs through Diffusion Models for Aerial Semantic Segmentation [69.42764583465508]
We explore the potential of generative image diffusion to address the scarcity of annotated data in earth observation tasks. To the best of our knowledge, we are the first to generate both images and corresponding masks for satellite segmentation.
arXiv Detail & Related papers (2024-03-25T10:30:22Z)
MetaSegNet: Metadata-collaborative Vision-Language Representation Learning for Semantic Segmentation of Remote Sensing Images [7.0622873873577054]
We propose a novel metadata-collaborative segmentation network (MetaSegNet) for semantic segmentation of remote sensing images. Unlike the common model structure that only uses unimodal visual data, we extract the key characteristic from freely available remote sensing image metadata. We construct an image encoder, a text encoder, and a crossmodal attention fusion subnetwork to extract the image and text feature.
arXiv Detail & Related papers (2023-12-20T03:16:34Z)
CLiSA: A Hierarchical Hybrid Transformer Model using Orthogonal Cross Attention for Satellite Image Cloud Segmentation [5.178465447325005]
Deep learning algorithms have emerged as promising approach to solve image segmentation problems. In this paper, we introduce a deep-learning model for effective cloud mask generation named CLiSA - Cloud segmentation via Lipschitz Stable Attention network. We demonstrate both qualitative and quantitative outcomes for multiple satellite image datasets including Landsat-8, Sentinel-2, and Cartosat-2s.
arXiv Detail & Related papers (2023-11-29T09:31:31Z)
Advancing Visual Grounding with Scene Knowledge: Benchmark and Method [74.72663425217522]
Visual grounding (VG) aims to establish fine-grained alignment between vision and language. Most existing VG datasets are constructed using simple description texts. We propose a novel benchmark of underlineScene underlineKnowledge-guided underlineVisual underlineGrounding.
arXiv Detail & Related papers (2023-07-21T13:06:02Z)
DCN-T: Dual Context Network with Transformer for Hyperspectral Image Classification [109.09061514799413]
Hyperspectral image (HSI) classification is challenging due to spatial variability caused by complex imaging conditions. We propose a tri-spectral image generation pipeline that transforms HSI into high-quality tri-spectral images. Our proposed method outperforms state-of-the-art methods for HSI classification.
arXiv Detail & Related papers (2023-04-19T18:32:52Z)
High-Quality Entity Segmentation [110.55724145851725]
CropFormer is designed to tackle the intractability of instance-level segmentation on high-resolution images. It improves mask prediction by fusing high-res image crops that provide more fine-grained image details and the full image. With CropFormer, we achieve a significant AP gain of $1.9$ on the challenging entity segmentation task.
arXiv Detail & Related papers (2022-11-10T18:58:22Z)
CVLNet: Cross-View Semantic Correspondence Learning for Video-based Camera Localization [89.69214577915959]
This paper tackles the problem of Cross-view Video-based camera localization. We propose estimating the query camera's relative displacement to a satellite image before similarity matching. Experiments have demonstrated the effectiveness of video-based localization over single image-based localization.
arXiv Detail & Related papers (2022-08-07T07:35:17Z)
Geo-Localization via Ground-to-Satellite Cross-View Image Retrieval [25.93015219830576]
Given a ground-view image of a landmark, we aim to achieve cross-view geo-localization by searching out its corresponding satellite-view images. We take advantage of drone-view information as a bridge between ground-view and satellite-view domains.
arXiv Detail & Related papers (2022-05-22T17:35:13Z)
Geometry-Guided Street-View Panorama Synthesis from Satellite Imagery [80.6282101835164]
We present a new approach for synthesizing a novel street-view panorama given an overhead satellite image. Our method generates a Google's omnidirectional street-view type panorama, as if it is captured from the same geographical location as the center of the satellite patch.
arXiv Detail & Related papers (2021-03-02T10:27:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.