NeSyGeo: A Neuro-Symbolic Framework for Multimodal Geometric Reasoning Data Generation
- URL: http://arxiv.org/abs/2505.17121v1
- Date: Wed, 21 May 2025 16:45:49 GMT
- Title: NeSyGeo: A Neuro-Symbolic Framework for Multimodal Geometric Reasoning Data Generation
- Authors: Weiming Wu, Zi-kang Wang, Jin Ye, Zhi Zhou, Yu-Feng Li, Lan-Zhe Guo,
- Abstract summary: We propose NeSyGeo, a novel neuro-symbolic framework for generating geometric reasoning data.<n>We release a new benchmark NeSyGeo-Test for evaluating geometric reasoning abilities in large language models.
- Score: 47.58527162381057
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Obtaining large-scale, high-quality data with reasoning paths is crucial for improving the geometric reasoning capabilities of multi-modal large language models (MLLMs). However, existing data generation methods, whether based on predefined templates or constrained symbolic provers, inevitably face diversity and numerical generalization limitations. To address these limitations, we propose NeSyGeo, a novel neuro-symbolic framework for generating geometric reasoning data. First, we propose a domain-specific language grounded in the entity-relation-constraint paradigm to comprehensively represent all components of plane geometry, along with generative actions defined within this symbolic space. We then design a symbolic-visual-text pipeline that synthesizes symbolic sequences, maps them to corresponding visual and textual representations, and generates diverse question-answer (Q&A) pairs using large language models (LLMs). To the best of our knowledge, we are the first to propose a neuro-symbolic approach in generating multimodal reasoning data. Based on this framework, we construct NeSyGeo-CoT and NeSyGeo-Caption datasets, containing 100k samples, and release a new benchmark NeSyGeo-Test for evaluating geometric reasoning abilities in MLLMs. Experiments demonstrate that the proposal significantly and consistently improves the performance of multiple MLLMs under both reinforcement and supervised fine-tuning. With only 4k samples and two epochs of reinforcement fine-tuning, base models achieve improvements of up to +15.8% on MathVision, +8.4% on MathVerse, and +7.3% on GeoQA. Notably, a 4B model can be improved to outperform an 8B model from the same series on geometric reasoning tasks.
Related papers
- Enhancing the Geometric Problem-Solving Ability of Multimodal LLMs via Symbolic-Neural Integration [57.95306827012784]
We propose GeoGen, a pipeline that can automatically generate step-wise reasoning paths for geometry diagrams.<n>By leveraging the precise symbolic reasoning, textbfGeoGen produces large-scale, high-quality question-answer pairs.<n>We train textbfGeoLogic, a Large Language Model (LLM), using synthetic data generated by GeoGen.
arXiv Detail & Related papers (2025-04-17T09:13:46Z) - Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity Visual Descriptions [23.294711275107606]
This paper introduces Geoperception, a benchmark to evaluate an MLLM's ability to accurately transcribe 2D geometric information from an image.<n>We then conduct a comprehensive empirical study to explore strategies for improving their performance on geometric tasks.<n>We develop Euclid, a family of models specifically optimized for strong low-level geometric perception.
arXiv Detail & Related papers (2024-12-11T19:12:13Z) - Geometry Distributions [51.4061133324376]
We propose a novel geometric data representation that models geometry as distributions.
Our approach uses diffusion models with a novel network architecture to learn surface point distributions.
We evaluate our representation qualitatively and quantitatively across various object types, demonstrating its effectiveness in achieving high geometric fidelity.
arXiv Detail & Related papers (2024-11-25T04:06:48Z) - STREAM: A Universal State-Space Model for Sparse Geometric Data [2.9483719973596303]
Handling unstructured geometric data, such as point clouds or event-based vision, is a pressing challenge in the field of machine vision.
We propose to encode geometric structure explicitly into the parameterization of a state-space model.
Our model deploys the Mamba selective state-space model with a modified kernel to efficiently map sparse data to modern hardware.
arXiv Detail & Related papers (2024-11-19T16:06:32Z) - R-CoT: Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models [86.06825304372613]
We propose a two-stage Reverse Chain-of-Thought (R-CoT) geometry problem generation pipeline.
First, we introduce GeoChain to produce high-fidelity geometric images and corresponding descriptions.
We then design a Reverse A&Q method that reasons step-by-step based on the descriptions and generates questions in reverse from the reasoning results.
arXiv Detail & Related papers (2024-10-23T13:58:39Z) - Grounding Continuous Representations in Geometry: Equivariant Neural Fields [26.567143650213225]
We propose a novel CNF architecture which uses a geometry-informed cross-attention to condition the NeF on a geometric variable.<n>We show that this approach induces a steerability property by which both field and latent are grounded in geometry.<n>We validate these main properties in a range of tasks including classification, segmentation, forecasting, reconstruction and generative modelling.
arXiv Detail & Related papers (2024-06-09T12:16:30Z) - Latent Semantic Consensus For Deterministic Geometric Model Fitting [109.44565542031384]
We propose an effective method called Latent Semantic Consensus (LSC)
LSC formulates the model fitting problem into two latent semantic spaces based on data points and model hypotheses.
LSC is able to provide consistent and reliable solutions within only a few milliseconds for general multi-structural model fitting.
arXiv Detail & Related papers (2024-03-11T05:35:38Z) - A Survey of Geometric Graph Neural Networks: Data Structures, Models and Applications [71.809127869349]
This paper formalizes geometric graph as the data structure, on top of which we provide a unified view of existing models from the geometric message passing perspective.<n>We also summarize the applications as well as the related datasets to facilitate later research for methodology development and experimental evaluation.
arXiv Detail & Related papers (2024-03-01T12:13:04Z) - G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model [124.68242155098189]
Large language models (LLMs) have shown remarkable proficiency in human-level reasoning and generation capabilities.
G-LLaVA demonstrates exceptional performance in solving geometric problems, significantly outperforming GPT-4-V on the MathVista benchmark with only 7B parameters.
arXiv Detail & Related papers (2023-12-18T17:36:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.