GeoFocus: Blending Efficient Global-to-Local Perception for Multimodal Geometry Problem-Solving
- URL: http://arxiv.org/abs/2602.08524v1
- Date: Mon, 09 Feb 2026 11:15:01 GMT
- Title: GeoFocus: Blending Efficient Global-to-Local Perception for Multimodal Geometry Problem-Solving
- Authors: Linger Deng, Yuliang Liu, Wenwen Yu, Zujia Zhang, Jianzhong Ju, Zhenbo Luo, Xiang Bai,
- Abstract summary: GeoFocus is a novel framework comprising two core modules.<n>GeoFocus achieves a 4.7% accuracy improvement over leading specialized models.<n>It demonstrates superior robustness in MATHVERSE under diverse visual conditions.
- Score: 55.14836667214487
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Geometry problem-solving remains a significant challenge for Large Multimodal Models (LMMs), requiring not only global shape recognition but also attention to intricate local relationships related to geometric theory. To address this, we propose GeoFocus, a novel framework comprising two core modules. 1) Critical Local Perceptor, which automatically identifies and emphasizes critical local structure (e.g., angles, parallel lines, comparative distances) through thirteen theory-based perception templates, boosting critical local feature coverage by 61% compared to previous methods. 2) VertexLang, a compact topology formal language, encodes global figures through vertex coordinates and connectivity relations. By replacing bulky code-based encodings, VertexLang reduces global perception training time by 20% while improving topology recognition accuracy. When evaluated in Geo3K, GeoQA, and FormalGeo7K, GeoFocus achieves a 4.7% accuracy improvement over leading specialized models and demonstrates superior robustness in MATHVERSE under diverse visual conditions. Project Page -- https://github.com/dle666/GeoFocus
Related papers
- Thinking with Geometry: Active Geometry Integration for Spatial Reasoning [68.59084007360615]
We propose GeoThinker, a framework that shifts paradigm passive fusion to active perception.<n>Instead of feature mixing, GeoThinker enables the model to selectively retrieve geometric evidence conditioned on its internal reasoning demands.<n>Our results indicate that the ability to actively integrate spatial structures is essential for next-generation spatial intelligence.
arXiv Detail & Related papers (2026-02-05T18:59:32Z) - GeoEvolve: Automating Geospatial Model Discovery via Multi-Agent Large Language Models [49.257706111340134]
We introduce GeoEvolve, a multi-agent LLM framework that couples evolutionary search with geospatial domain knowledge.<n>We evaluate it on two fundamental and classical tasks: spatial (kriging) and spatial uncertainty.<n>It reduces spatial error (RMSE) by 13-21% and enhances uncertainty estimation performance by 17%.
arXiv Detail & Related papers (2025-09-25T21:03:57Z) - Geo2Vec: Shape- and Distance-Aware Neural Representation of Geospatial Entities [13.206124101350847]
We introduce Geo2Vec, a novel method inspired by signed distance fields (SDF) that operates directly in the original space.<n>A neural network trained to approximate the SDF produces compact, geometry-aware, and unified representations for all geo-entity types.<n> Empirical results show that Geo2Vec consistently outperforms existing methods in representing shape and location, capturing topological and distance relationships, and achieving greater efficiency in real-world GeoAI applications.
arXiv Detail & Related papers (2025-08-26T07:12:28Z) - TrustGeoGen: Formal-Verified Data Engine for Trustworthy Multi-modal Geometric Problem Solving [106.04001249574786]
TrustGeoGen is a data engine that generates formally verified geometric problems to establish a principled and trustworthy benchmark.<n>Our engine integrates four key innovations: 1) Multimodal Alignment, which synchronizes the generation of diagrams, text, and step-by-step solutions; 2) Formal Verification, ensuring all reasoning paths are rule-compliant; 3) Connection Thinking, bridging formal deduction with human-like logical steps; and 4) our textitGeoExplore series algorithms, which produce diverse problem variants with multiple solutions and self-reflective backtracking.
arXiv Detail & Related papers (2025-04-22T10:45:23Z) - Geolocation with Real Human Gameplay Data: A Large-Scale Dataset and Human-Like Reasoning Framework [59.42946541163632]
We introduce a comprehensive geolocation framework with three key components.<n>GeoComp, a large-scale dataset; GeoCoT, a novel reasoning method; and GeoEval, an evaluation metric.<n>We demonstrate that GeoCoT significantly boosts geolocation accuracy by up to 25% while enhancing interpretability.
arXiv Detail & Related papers (2025-02-19T14:21:25Z) - GeoFormer: Learning Point Cloud Completion with Tri-Plane Integrated Transformer [41.26276375114911]
Point cloud completion aims to recover accurate global geometry and preserve fine-grained local details from partial point clouds.
Conventional methods typically predict unseen points directly from 3D point cloud coordinates or use self-projected multi-view depth maps.
We introduce a GeoFormer that simultaneously enhances the global geometric structure of the points and improves the local details.
arXiv Detail & Related papers (2024-08-13T03:15:36Z) - CurriculumLoc: Enhancing Cross-Domain Geolocalization through
Multi-Stage Refinement [11.108860387261508]
Visual geolocalization is a cost-effective and scalable task that involves matching one or more query images taken at some unknown location, to a set of geo-tagged reference images.
We develop CurriculumLoc, a novel keypoint detection and description with global semantic awareness and a local geometric verification.
We achieve new high recall@1 scores of 62.6% and 94.5% on ALTO, with two different distances metrics, respectively.
arXiv Detail & Related papers (2023-11-20T08:40:01Z) - Geo-Encoder: A Chunk-Argument Bi-Encoder Framework for Chinese
Geographic Re-Ranking [61.60169764507917]
Chinese geographic re-ranking task aims to find the most relevant addresses among retrieved candidates.
We propose an innovative framework, namely Geo-Encoder, to more effectively integrate Chinese geographical semantics into re-ranking pipelines.
arXiv Detail & Related papers (2023-09-04T13:44:50Z) - Focus on Local: Detecting Lane Marker from Bottom Up via Key Point [10.617793053931964]
We propose a novel lane marker detection solution, FOLOLane, that focuses on modeling local patterns and achieving prediction of global structures.
Specifically, the CNN models lowcomplexity local patterns with two separate heads, the first one predicts the existence of key points, and the second refines the location of key points in the local range and correlates key points of the same lane line.
arXiv Detail & Related papers (2021-05-28T08:59:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.