Related papers: PlaceFM: A Training-free Geospatial Foundation Model of Places

Related papers

GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization [53.080882980294795]
Current research on agentic visual reasoning enables deep multimodal understanding but primarily focuses on image manipulation tools.<n>In this work, we revisit the geolocalization task, which requires not only nuanced visual grounding but also web search to confirm or refine hypotheses.<n>Since existing geolocalization benchmarks fail to meet the need for high-resolution imagery and the localization challenge for deep agentic reasoning, we curate GeoBench.<n>We propose GeoVista, an agentic model that seamlessly integrates tool invocation within the reasoning loop, including an image-zoom-in tool to magnify regions of interest and a web-search tool to retrieve related
arXiv Detail & Related papers (2025-11-19T18:59:22Z)
GeoDistill: Geometry-Guided Self-Distillation for Weakly Supervised Cross-View Localization [70.65458151146767]
Cross-view localization is crucial for large-scale outdoor applications like autonomous navigation and augmented reality.<n>Existing methods often rely on fully supervised learning, which requires costly ground-truth pose annotations.<n>We propose GeoDistill, a framework that uses teacher-student learning with Field-of-View (FoV)-based masking.
arXiv Detail & Related papers (2025-07-15T03:00:15Z)
Geographical Context Matters: Bridging Fine and Coarse Spatial Information to Enhance Continental Land Cover Mapping [2.9212099078191756]
BRIDGE-LC is a novel deep learning framework that integrates multi-scale geospatial information into the land cover classification process.<n>Our results demonstrate that integrating geospatial information improves land cover mapping performance.
arXiv Detail & Related papers (2025-04-16T17:42:46Z)
LocDiffusion: Identifying Locations on Earth by Diffusing in the Hilbert Space [10.342723428164412]
We propose to leverage diffusion as a mechanism for image geolocalization.<n>To avoid the problematic manifold reprojection step in diffusion, we developed a novel spherical positional encoding-decoding framework.<n>We train a conditional latent diffusion model called LocDiffusion that generates geolocations under the guidance of images.
arXiv Detail & Related papers (2025-03-23T17:15:26Z)
OmniGeo: Towards a Multimodal Large Language Models for Geospatial Artificial Intelligence [51.0456395687016]
multimodal large language models (LLMs) have opened new frontiers in artificial intelligence.<n>We propose a MLLM (OmniGeo) tailored to geospatial applications.<n>By combining the strengths of natural language understanding and spatial reasoning, our model enhances the ability of instruction following and the accuracy of GeoAI systems.
arXiv Detail & Related papers (2025-03-20T16:45:48Z)
GeoJEPA: Towards Eliminating Augmentation- and Sampling Bias in Multimodal Geospatial Learning [0.0]
We present GeoJEPA, a versatile multimodal fusion model for geospatial data built on the self-supervised Joint-Embedding Predictive Architecture.<n>We aim to eliminate the widely accepted augmentation- and sampling biases found in self-supervised geospatial representation learning.<n>The results are multimodal semantic representations of urban regions and map entities that we evaluate both quantitatively and qualitatively.
arXiv Detail & Related papers (2025-02-25T22:03:28Z)
Geolocation with Real Human Gameplay Data: A Large-Scale Dataset and Human-Like Reasoning Framework [59.42946541163632]
We introduce a comprehensive geolocation framework with three key components.<n>GeoComp, a large-scale dataset; GeoCoT, a novel reasoning method; and GeoEval, an evaluation metric.<n>We demonstrate that GeoCoT significantly boosts geolocation accuracy by up to 25% while enhancing interpretability.
arXiv Detail & Related papers (2025-02-19T14:21:25Z)
Self-Supervised Representation Learning for Geospatial Objects: A Survey [21.504978593542354]
Self-supervised learning (SSL) has garnered increasing attention for its ability to learn effective and generalizable representations directly from data without extensive labeled supervision.<n>This paper presents a survey of SSL techniques specifically applied to or developed for geospatial objects in three primary geometric vector types: Point, Polyline, and Polygon.<n>We examine the emerging trends in SSL for geospatial objects, particularly the gradual advancements towards geospatial foundation models.
arXiv Detail & Related papers (2024-08-22T05:28:22Z)
Geolocation Representation from Large Language Models are Generic Enhancers for Spatio-Temporal Learning [10.438284728725842]
Universal representation models are less prevalent than their extensive use in natural language processing and computer vision.<n>This discrepancy arises primarily from the high costs associated with the input of existing representation models.<n>We develop a training-free method that leverages large language models to derive geolocation representations.
arXiv Detail & Related papers (2024-08-22T04:05:02Z)
Enhancing Worldwide Image Geolocation by Ensembling Satellite-Based Ground-Level Attribute Predictors [4.415977307120618]
We examine the challenge of estimating the location of a single ground-level image in the absence of GPS or other location metadata. We introduce a novel metric, Recall vs Area, which measures the accuracy of estimated distributions of locations. We then examine an ensembling approach to global-scale image geolocation, which incorporates information from multiple sources.
arXiv Detail & Related papers (2024-07-18T19:15:52Z)
City Foundation Models for Learning General Purpose Representations from OpenStreetMap [16.09047066527081]
We present CityFM, a framework to train a foundation model within a selected geographical area of interest, such as a city. CityFM relies solely on open data from OpenStreetMap, and produces multimodal representations of entities of different types, spatial, visual, and textual information. In all the experiments, CityFM achieves performance superior to, or on par with, the baselines.
arXiv Detail & Related papers (2023-10-01T05:55:30Z)
GeoCLIP: Clip-Inspired Alignment between Locations and Images for Effective Worldwide Geo-localization [61.10806364001535]
Worldwide Geo-localization aims to pinpoint the precise location of images taken anywhere on Earth. Existing approaches divide the globe into discrete geographic cells, transforming the problem into a classification task. We propose GeoCLIP, a novel CLIP-inspired Image-to-GPS retrieval approach that enforces alignment between the image and its corresponding GPS locations.
arXiv Detail & Related papers (2023-09-27T20:54:56Z)
Assessment of a new GeoAI foundation model for flood inundation mapping [4.312965283062856]
This paper evaluates the performance of the first-of-its-kind geospatial foundation model, IBM-NASA's Prithvi, to support a crucial geospatial analysis task: flood inundation mapping. A benchmark dataset, Sen1Floods11, is used in the experiments, and the models' predictability, generalizability, and transferability are evaluated. Results show the good transferability of the Prithvi model, highlighting its performance advantages in segmenting flooded areas in previously unseen regions.
arXiv Detail & Related papers (2023-09-25T19:50:47Z)
Geo-Encoder: A Chunk-Argument Bi-Encoder Framework for Chinese Geographic Re-Ranking [61.60169764507917]
Chinese geographic re-ranking task aims to find the most relevant addresses among retrieved candidates. We propose an innovative framework, namely Geo-Encoder, to more effectively integrate Chinese geographical semantics into re-ranking pipelines.
arXiv Detail & Related papers (2023-09-04T13:44:50Z)
A General Purpose Neural Architecture for Geospatial Systems [142.43454584836812]
We present a roadmap towards the construction of a general-purpose neural architecture (GPNA) with a geospatial inductive bias. We envision how such a model may facilitate cooperation between members of the community.
arXiv Detail & Related papers (2022-11-04T09:58:57Z)
PGL: Prior-Guided Local Self-supervised Learning for 3D Medical Image Segmentation [87.50205728818601]
We propose a PriorGuided Local (PGL) self-supervised model that learns the region-wise local consistency in the latent feature space. Our PGL model learns the distinctive representations of local regions, and hence is able to retain structural information.
arXiv Detail & Related papers (2020-11-25T11:03:11Z)
Each Part Matters: Local Patterns Facilitate Cross-view Geo-localization [54.00111565818903]
Cross-view geo-localization is to spot images of the same geographic target from different platforms. Existing methods usually concentrate on mining the fine-grained feature of the geographic target in the image center. We introduce a simple and effective deep neural network, called Local Pattern Network (LPN), to take advantage of contextual information.
arXiv Detail & Related papers (2020-08-26T16:06:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.