Enhancing Ground-to-Aerial Image Matching for Visual Misinformation Detection Using Semantic Segmentation
- URL: http://arxiv.org/abs/2502.06288v3
- Date: Mon, 24 Feb 2025 14:04:41 GMT
- Title: Enhancing Ground-to-Aerial Image Matching for Visual Misinformation Detection Using Semantic Segmentation
- Authors: Emanuele Mule, Matteo Pannacci, Ali Ghasemi Goudarzi, Francesco Pro, Lorenzo Papa, Luca Maiano, Irene Amerini,
- Abstract summary: Recent advancements in generative AI techniques have raised serious concerns about the credibility of digital media available on the Internet.<n>To address these concerns, the ability to geolocate a non-geo-tagged ground-view image without external information, such as GPS coordinates, has become increasingly critical.<n>This study tackles the challenge of linking a ground-view image, potentially exhibiting varying fields of view (FoV), to its corresponding satellite image without the aid of GPS data.
- Score: 1.9055921262476347
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The recent advancements in generative AI techniques, which have significantly increased the online dissemination of altered images and videos, have raised serious concerns about the credibility of digital media available on the Internet and distributed through information channels and social networks. This issue particularly affects domains that rely heavily on trustworthy data, such as journalism, forensic analysis, and Earth observation. To address these concerns, the ability to geolocate a non-geo-tagged ground-view image without external information, such as GPS coordinates, has become increasingly critical. This study tackles the challenge of linking a ground-view image, potentially exhibiting varying fields of view (FoV), to its corresponding satellite image without the aid of GPS data. To achieve this, we propose a novel four-stream Siamese-like architecture, the Quadruple Semantic Align Net (SAN-QUAD), which extends previous state-of-the-art (SOTA) approaches by leveraging semantic segmentation applied to both ground and satellite imagery. Experimental results on a subset of the CVUSA dataset demonstrate significant improvements of up to 9.8% over prior methods across various FoV settings.
Related papers
- AerialMegaDepth: Learning Aerial-Ground Reconstruction and View Synthesis [57.249817395828174]
We propose a scalable framework combining pseudo-synthetic renderings from 3D city-wide meshes with real, ground-level crowd-sourced images.
The pseudo-synthetic data simulates a wide range of aerial viewpoints, while the real, crowd-sourced images help improve visual fidelity for ground-level images.
Using this hybrid dataset, we fine-tune several state-of-the-art algorithms and achieve significant improvements on real-world, zero-shot aerial-ground tasks.
arXiv Detail & Related papers (2025-04-17T17:57:05Z) - Game4Loc: A UAV Geo-Localization Benchmark from Game Data [0.0]
We introduce a more practical UAV geo-localization task including partial matches of cross-view paired data.
Experiments demonstrate the effectiveness of our data and training method for UAV geo-localization.
arXiv Detail & Related papers (2024-09-25T13:33:28Z) - Weakly-supervised Camera Localization by Ground-to-satellite Image Registration [52.54992898069471]
We propose a weakly supervised learning strategy for ground-to-satellite image registration.
It derives positive and negative satellite images for each ground image.
We also propose a self-supervision strategy for cross-view image relative rotation estimation.
arXiv Detail & Related papers (2024-09-10T12:57:16Z) - Geospecific View Generation -- Geometry-Context Aware High-resolution Ground View Inference from Satellite Views [5.146618378243241]
We propose a novel pipeline to generate geospecifc views that maximally respect the weak geometry and texture from multi-view satellite images.
Our method directly predicts ground-view images at geolocation by using a comprehensive set of information from the satellite image.
We demonstrate our pipeline is the first to generate close-to-real and geospecific ground views merely based on satellite images.
arXiv Detail & Related papers (2024-07-10T21:51:50Z) - Robust Disaster Assessment from Aerial Imagery Using Text-to-Image Synthetic Data [66.49494950674402]
We leverage emerging text-to-image generative models in creating large-scale synthetic supervision for the task of damage assessment from aerial images.
We build an efficient and easily scalable pipeline to generate thousands of post-disaster images from low-resource domains.
We validate the strength of our proposed framework under cross-geography domain transfer setting from xBD and SKAI images in both single-source and multi-source settings.
arXiv Detail & Related papers (2024-05-22T16:07:05Z) - A Semantic Segmentation-guided Approach for Ground-to-Aerial Image Matching [30.324252605889356]
This work addresses the problem of matching a query ground-view image with the corresponding satellite image without GPS data.
This is done by comparing the features from a ground-view image and a satellite one, innovatively leveraging the corresponding latter's segmentation mask through a three-stream Siamese-like network.
The novelty lies in the fusion of satellite images in combination with their semantic segmentation masks, aimed at ensuring that the model can extract useful features and focus on the significant parts of the images.
arXiv Detail & Related papers (2024-04-17T12:13:18Z) - Getting it Right: Improving Spatial Consistency in Text-to-Image Models [103.52640413616436]
One of the key shortcomings in current text-to-image (T2I) models is their inability to consistently generate images which faithfully follow the spatial relationships specified in the text prompt.
We create SPRIGHT, the first spatially focused, large-scale dataset, by re-captioning 6 million images from 4 widely used vision datasets.
We find that training on images containing a larger number of objects leads to substantial improvements in spatial consistency, including state-of-the-art results on T2I-CompBench with a spatial score of 0.2133, by fine-tuning on 500 images.
arXiv Detail & Related papers (2024-04-01T15:55:25Z) - SatSynth: Augmenting Image-Mask Pairs through Diffusion Models for Aerial Semantic Segmentation [69.42764583465508]
We explore the potential of generative image diffusion to address the scarcity of annotated data in earth observation tasks.
To the best of our knowledge, we are the first to generate both images and corresponding masks for satellite segmentation.
arXiv Detail & Related papers (2024-03-25T10:30:22Z) - Multiview Aerial Visual Recognition (MAVREC): Can Multi-view Improve
Aerial Visual Perception? [57.77643186237265]
We present Multiview Aerial Visual RECognition or MAVREC, a video dataset where we record synchronized scenes from different perspectives.
MAVREC consists of around 2.5 hours of industry-standard 2.7K resolution video sequences, more than 0.5 million frames, and 1.1 million annotated bounding boxes.
This makes MAVREC the largest ground and aerial-view dataset, and the fourth largest among all drone-based datasets.
arXiv Detail & Related papers (2023-12-07T18:59:14Z) - Orientation-Guided Contrastive Learning for UAV-View Geo-Localisation [0.0]
We present an orientation-guided training framework for UAV-view geo-localisation.
We experimentally demonstrate that this prediction supports the training and outperforms previous approaches.
We achieve state-of-the-art results on both the University-1652 and University-160k datasets.
arXiv Detail & Related papers (2023-08-02T07:32:32Z) - Semantic Segmentation of Vegetation in Remote Sensing Imagery Using Deep
Learning [77.34726150561087]
We propose an approach for creating a multi-modal and large-temporal dataset comprised of publicly available Remote Sensing data.
We use Convolutional Neural Networks (CNN) models that are capable of separating different classes of vegetation.
arXiv Detail & Related papers (2022-09-28T18:51:59Z) - Weakly Supervised Domain Adaptation for Built-up Region Segmentation in
Aerial and Satellite Imagery [3.8508264614798517]
Built-up area estimation is an important component in understanding the human impact on the environment, the effect of public policy, and general urban population analysis.
The diverse nature of aerial and satellite imagery and lack of labeled data covering this diversity makes machine learning algorithms difficult to generalize.
This paper proposes a novel domain adaptation algorithm to handle the challenges posed by the satellite and aerial imagery.
arXiv Detail & Related papers (2020-07-05T10:05:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.