Related papers: NeSyGeo: A Neuro-Symbolic Framework for Multimodal Geometric Reasoning Data Generation

NeSyGeo: A Neuro-Symbolic Framework for Multimodal Geometric Reasoning Data Generation

URL: http://arxiv.org/abs/2505.17121v2
Date: Thu, 02 Oct 2025 18:15:25 GMT
Title: NeSyGeo: A Neuro-Symbolic Framework for Multimodal Geometric Reasoning Data Generation
Authors: Weiming Wu, Jin Ye, Zi-kang Wang, Zhi Zhou, Yu-Feng Li, Lan-Zhe Guo,
Abstract summary: NeSyGeo is a novel neuro-symbolic framework for generating geometric reasoning data.<n>We release a new benchmark NeSyGeo-Test for evaluating geometric reasoning abilities in MLLMs.
Score: 23.592137999309546
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Obtaining large-scale, high-quality reasoning data is crucial for improving the geometric reasoning capabilities of multi-modal large language models (MLLMs). However, existing data generation methods, whether based on predefined tem plates or constrained symbolic provers, inevitably face diversity and numerical generalization limitations. To address these limitations, we propose NeSyGeo, a novel neuro-symbolic framework for generating geometric reasoning data. First, we propose a domain-specific language grounded in the entity-attributes-relations paradigm to comprehensively represent all components of plane geometry, along with generative actions defined within this symbolic space. We then design a symbolic-visual-text pipeline that synthesizes symbolic sequences, maps them to visual and textual representations and generates reasoning path with reverse search and forward validation. Based on this framework, we construct NeSyGeo CoT and NeSyGeo-Caption datasets, containing 100k samples, and release a new benchmark NeSyGeo-Test for evaluating geometric reasoning abilities in MLLMs. Experiments demonstrate that the proposal significantly and consistently improves the performance of multiple MLLMs under both reinforcement and supervised fine-tuning. With only 4k samples and two epochs of reinforcement fine-tuning, base models achieve improvements of up to +15.8% on MathVision, +8.4% on MathVerse, and +7.3% on GeoQA. Notably, a 4B model can be improved to outperform an 8B model from the same series on geometric reasoning tasks.s

Related papers

Milestones over Outcome: Unlocking Geometric Reasoning with Sub-Goal Verifiable Reward [67.00373428443879]
We introduce a paradigm shift towards subgoal-level evaluation and learning.<n>We first construct GeoGoal, a benchmark synthesized via a rigorous formal verification data engine.<n>We propose the Sub-Goal Verifiable Reward (SGVR) framework, which replaces sparse signals with dense rewards based on the Skeleton Rate.
arXiv Detail & Related papers (2026-01-08T16:17:56Z)
GeoFM: Enhancing Geometric Reasoning of MLLMs via Synthetic Data Generation through Formal Language [11.134307550723037]
Multi-modal Large Language Models (MLLMs) have gained significant attention in both academia and industry.<n>These models face challenges in mathematical geometric reasoning due to the scarcity of high-quality geometric data.<n>We propose GeoFM, a novel method for synthesizing geometric data.
arXiv Detail & Related papers (2025-10-31T12:56:32Z)
GeoThought: A Dataset for Enhancing Mathematical Geometry Reasoning in Vision-Language Models [3.66076510862044]
We develop a comprehensive geometric reasoning corpus with two subsets: Geo-Thought-6K with 6,243 samples and its augmented version Geo-Thought-Augmented-10K containing 10,834 samples.<n>Using this dataset, we developed GeoThought-MLLM, a mathematical reasoning multimodal model that generates detailed thinking processes during problem-solving.<n>Our model outperforms existing benchmarks in geometric tasks, demonstrating that training with our Chain-of-Thought dataset improves geometric reasoning capabilities across both in-domain and out-of-domain settings.
arXiv Detail & Related papers (2025-10-23T16:43:54Z)
CapGeo: A Caption-Assisted Approach to Geometric Reasoning [10.716955074782902]
We introduce CapGeo, a caption-assisted reasoning framework that bridges visual and textual modalities.<n> Experiments show substantial improvements when models are equipped with captions.<n>We also propose CapGeo-Bench, a dataset of 4,641 curated figure-caption pairs.
arXiv Detail & Related papers (2025-10-10T11:47:54Z)
GeoRef: Referring Expressions in Geometry via Task Formulation, Synthetic Supervision, and Reinforced MLLM-based Solutions [45.70578816057097]
We introduce the task of Referring Expression (REC) for geometric problems.<n>REC evaluates whether models can localize points, shapes, and spatial relations in diagrams in response to textual prompts.<n>We generate a large-scale synthetic training dataset using a structured geometric formal language.
arXiv Detail & Related papers (2025-09-25T12:00:52Z)
Enhancing the Geometric Problem-Solving Ability of Multimodal LLMs via Symbolic-Neural Integration [57.95306827012784]
We propose GeoGen, a pipeline that can automatically generate step-wise reasoning paths for geometry diagrams.<n>By leveraging the precise symbolic reasoning, textbfGeoGen produces large-scale, high-quality question-answer pairs.<n>We train textbfGeoLogic, a Large Language Model (LLM), using synthetic data generated by GeoGen.
arXiv Detail & Related papers (2025-04-17T09:13:46Z)
Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity Visual Descriptions [23.294711275107606]
This paper introduces Geoperception, a benchmark to evaluate an MLLM's ability to accurately transcribe 2D geometric information from an image.<n>We then conduct a comprehensive empirical study to explore strategies for improving their performance on geometric tasks.<n>We develop Euclid, a family of models specifically optimized for strong low-level geometric perception.
arXiv Detail & Related papers (2024-12-11T19:12:13Z)
Geometry Distributions [51.4061133324376]
We propose a novel geometric data representation that models geometry as distributions. Our approach uses diffusion models with a novel network architecture to learn surface point distributions. We evaluate our representation qualitatively and quantitatively across various object types, demonstrating its effectiveness in achieving high geometric fidelity.
arXiv Detail & Related papers (2024-11-25T04:06:48Z)
STREAM: A Universal State-Space Model for Sparse Geometric Data [2.9483719973596303]
Handling unstructured geometric data, such as point clouds or event-based vision, is a pressing challenge in the field of machine vision. We propose to encode geometric structure explicitly into the parameterization of a state-space model. Our model deploys the Mamba selective state-space model with a modified kernel to efficiently map sparse data to modern hardware.
arXiv Detail & Related papers (2024-11-19T16:06:32Z)
R-CoT: Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models [86.06825304372613]
We propose a two-stage Reverse Chain-of-Thought (R-CoT) geometry problem generation pipeline. First, we introduce GeoChain to produce high-fidelity geometric images and corresponding descriptions. We then design a Reverse A&Q method that reasons step-by-step based on the descriptions and generates questions in reverse from the reasoning results.
arXiv Detail & Related papers (2024-10-23T13:58:39Z)
Grounding Continuous Representations in Geometry: Equivariant Neural Fields [26.567143650213225]
We propose a novel CNF architecture which uses a geometry-informed cross-attention to condition the NeF on a geometric variable.<n>We show that this approach induces a steerability property by which both field and latent are grounded in geometry.<n>We validate these main properties in a range of tasks including classification, segmentation, forecasting, reconstruction and generative modelling.
arXiv Detail & Related papers (2024-06-09T12:16:30Z)
Latent Semantic Consensus For Deterministic Geometric Model Fitting [109.44565542031384]
We propose an effective method called Latent Semantic Consensus (LSC) LSC formulates the model fitting problem into two latent semantic spaces based on data points and model hypotheses. LSC is able to provide consistent and reliable solutions within only a few milliseconds for general multi-structural model fitting.
arXiv Detail & Related papers (2024-03-11T05:35:38Z)
A Survey of Geometric Graph Neural Networks: Data Structures, Models and Applications [71.809127869349]
This paper formalizes geometric graph as the data structure, on top of which we provide a unified view of existing models from the geometric message passing perspective.<n>We also summarize the applications as well as the related datasets to facilitate later research for methodology development and experimental evaluation.
arXiv Detail & Related papers (2024-03-01T12:13:04Z)
G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model [124.68242155098189]
Large language models (LLMs) have shown remarkable proficiency in human-level reasoning and generation capabilities. G-LLaVA demonstrates exceptional performance in solving geometric problems, significantly outperforming GPT-4-V on the MathVista benchmark with only 7B parameters.
arXiv Detail & Related papers (2023-12-18T17:36:20Z)
Towards General-Purpose Representation Learning of Polygonal Geometries [62.34832826705641]
We develop a general-purpose polygon encoding model, which can encode a polygonal geometry into an embedding space. We conduct experiments on two tasks: 1) shape classification based on MNIST; 2) spatial relation prediction based on two new datasets - DBSR-46K and DBSR-cplx46K. Our results show that NUFTspec and ResNet1D outperform multiple existing baselines with significant margins.
arXiv Detail & Related papers (2022-09-29T15:59:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.