Related papers: R-CoT: Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models

R-CoT: Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models

URL: http://arxiv.org/abs/2410.17885v2
Date: Sun, 27 Oct 2024 09:02:01 GMT
Title: R-CoT: Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models
Authors: Linger Deng, Yuliang Liu, Bohan Li, Dongliang Luo, Liang Wu, Chengquan Zhang, Pengyuan Lyu, Ziyang Zhang, Gang Zhang, Errui Ding, Yingying Zhu, Xiang Bai,
Abstract summary: We propose a two-stage Reverse Chain-of-Thought (R-CoT) geometry problem generation pipeline. First, we introduce GeoChain to produce high-fidelity geometric images and corresponding descriptions. We then design a Reverse A&Q method that reasons step-by-step based on the descriptions and generates questions in reverse from the reasoning results.
Score: 86.06825304372613
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Existing Large Multimodal Models (LMMs) struggle with mathematical geometric reasoning due to a lack of high-quality image-text paired data. Current geometric data generation approaches, which apply preset templates to generate geometric data or use Large Language Models (LLMs) to rephrase questions and answers (Q&A), unavoidably limit data accuracy and diversity. To synthesize higher-quality data, we propose a two-stage Reverse Chain-of-Thought (R-CoT) geometry problem generation pipeline. First, we introduce GeoChain to produce high-fidelity geometric images and corresponding descriptions highlighting relations among geometric elements. We then design a Reverse A&Q method that reasons step-by-step based on the descriptions and generates questions in reverse from the reasoning results. Experiments demonstrate that the proposed method brings significant and consistent improvements on multiple LMM baselines, achieving new performance records in the 2B, 7B, and 8B settings. Notably, R-CoT-8B significantly outperforms previous state-of-the-art open-source mathematical models by 16.6% on MathVista and 9.2% on GeoQA, while also surpassing the closed-source model GPT-4o by an average of 13% across both datasets. The code is available at https://github.com/dle666/R-CoT.

Related papers

Hyperbolic Deep Learning for Foundation Models: A Survey [16.14776172953206]
Foundation models pre-trained on massive datasets have demonstrated remarkable success in diverse downstream tasks.<n>Recent advances have leveraged hyperbolic neural networks to enhance foundation models.<n>This paper provides a comprehensive review of hyperbolic neural networks and their recent development for foundation models.
arXiv Detail & Related papers (2025-07-23T09:50:17Z)
GeometryZero: Improving Geometry Solving for LLM with Group Contrastive Policy Optimization [63.107398132743825]
Group Contrastive Policy Optimization (GCPO) is a novel reinforcement learning framework featuring two key innovations.<n>We develop GeometryZero, a family of affordable-size geometric reasoning models that judiciously determine when to employ auxiliary construction.
arXiv Detail & Related papers (2025-06-08T14:18:15Z)
NeSyGeo: A Neuro-Symbolic Framework for Multimodal Geometric Reasoning Data Generation [47.58527162381057]
We propose NeSyGeo, a novel neuro-symbolic framework for generating geometric reasoning data.<n>We release a new benchmark NeSyGeo-Test for evaluating geometric reasoning abilities in large language models.
arXiv Detail & Related papers (2025-05-21T16:45:49Z)
Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought [56.71873693264532]
We prove that a two-layer transformer with $D$ steps of continuous CoTs can solve the directed graph reachability problem.<n>In our construction, each continuous thought vector is a superposition state that encodes multiple search frontiers simultaneously.
arXiv Detail & Related papers (2025-05-18T18:36:53Z)
TrustGeoGen: Scalable and Formal-Verified Data Engine for Trustworthy Multi-modal Geometric Problem Solving [66.0201510984171]
We propose a scalable data engine called TrustGeoGen for problem generation. By formal verification, TrustGeoGen produces GeoTrust-200K dataset with guaranteed modality integrity. Experiments reveal the state-of-the-art models achieve only 49.17% accuracy on GeoTrust-test.
arXiv Detail & Related papers (2025-04-22T10:45:23Z)
Geometric Meta-Learning via Coupled Ricci Flow: Unifying Knowledge Representation and Quantum Entanglement [7.410691988131121]
This paper establishes a unified framework integrating geometric flows with deep learning through three fundamental innovations.<n>First, we propose a thermodynamically coupled Ricci flow that dynamically adapts parameter space geometry to loss landscape topology.<n>Second, we derive explicit phase transition thresholds and critical learning rates through curvature blowup analysis.<n>Third, we establish an AdS/CFT-type holographic duality (Theoremrefthm:ads) between neural networks and conformal field theories.
arXiv Detail & Related papers (2025-03-25T17:32:31Z)
DTGBrepGen: A Novel B-rep Generative Model through Decoupling Topology and Geometry [3.859930277034918]
Boundary representation (B-rep) of geometric models is a fundamental format in Computer-Aided Design (CAD) We propose DTGBrepGen, a novel topology-geometry decoupled framework for B-rep generation.
arXiv Detail & Related papers (2025-03-17T12:34:14Z)
GeoCoder: Solving Geometry Problems by Generating Modular Code through Vision-Language Models [10.443672399225983]
Vision-parametric models (VLMs) have made significant progress in various multimodal tasks. They still struggle with geometry problems and are significantly limited by their inability to perform mathematical operations not seen during pre-training. We present GeoCoder, which leverages modular code-finetuning to generate and execute code using a predefined geometry function library.
arXiv Detail & Related papers (2024-10-17T12:56:52Z)
GeoGPT4V: Towards Geometric Multi-modal Large Language Models with Geometric Image Generation [15.931398242118073]
GPT-4 and GPT-4V are used to generate basic geometry problems with aligned text and images. We have produced a dataset of 4.9K geometry problems and combined it with 19K open-source data to form our GeoGPT4V dataset. Results demonstrate that the GeoGPT4V dataset significantly improves the geometry performance of various models on the MathVista and MathVision benchmarks.
arXiv Detail & Related papers (2024-06-17T13:04:27Z)
GOLD: Geometry Problem Solver with Natural Language Description [7.9345421580482185]
We present the Geometry problem sOlver with natural Language Description (GOLD) model. GOLD enhances the extraction of geometric relations by separately processing symbols and geometric primitives within the diagram. It converts the extracted relations into natural language descriptions, efficiently utilizing large language models to solve geometry math problems.
arXiv Detail & Related papers (2024-05-01T13:00:51Z)
ChainLM: Empowering Large Language Models with Improved Chain-of-Thought Prompting [124.69672273754144]
Chain-of-Thought (CoT) prompting can enhance the reasoning capabilities of large language models (LLMs) Existing CoT approaches usually focus on simpler reasoning tasks and thus result in low-quality and inconsistent CoT prompts. We introduce CoTGenius, a novel framework designed for the automatic generation of superior CoT prompts.
arXiv Detail & Related papers (2024-03-21T11:34:26Z)
A Survey of Geometric Graph Neural Networks: Data Structures, Models and Applications [67.33002207179923]
This paper presents a survey of data structures, models, and applications related to geometric GNNs. We provide a unified view of existing models from the geometric message passing perspective. We also summarize the applications as well as the related datasets to facilitate later research for methodology development and experimental evaluation.
arXiv Detail & Related papers (2024-03-01T12:13:04Z)
G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model [124.68242155098189]
Large language models (LLMs) have shown remarkable proficiency in human-level reasoning and generation capabilities. G-LLaVA demonstrates exceptional performance in solving geometric problems, significantly outperforming GPT-4-V on the MathVista benchmark with only 7B parameters.
arXiv Detail & Related papers (2023-12-18T17:36:20Z)
Topological Obstructions and How to Avoid Them [22.45861345237023]
We show that local optima can arise due to singularities or an incorrect degree or winding number. We propose a new flow-based model that maps data points to multimodal distributions over geometric spaces.
arXiv Detail & Related papers (2023-12-12T18:56:14Z)
RGM: A Robust Generalizable Matching Model [49.60975442871967]
We propose a deep model for sparse and dense matching, termed RGM (Robust Generalist Matching) To narrow the gap between synthetic training samples and real-world scenarios, we build a new, large-scale dataset with sparse correspondence ground truth. We are able to mix up various dense and sparse matching datasets, significantly improving the training diversity.
arXiv Detail & Related papers (2023-10-18T07:30:08Z)
Community Recovery in the Geometric Block Model [38.77098549680883]
We show that a simple triangle-counting dataset to detect communities in the geometric block model is near-optimal. We also show that our algorithm performs extremely well, both theoretically and practically.
arXiv Detail & Related papers (2022-06-22T18:10:49Z)
Robust and Accurate Superquadric Recovery: a Probabilistic Approach [29.7543198254021]
We propose the first probabilistic method to recover superquadrics from point clouds. Our method outperforms the state-of-the-art in terms of accuracy, efficiency, and robustness on both synthetic and real-world datasets.
arXiv Detail & Related papers (2021-11-29T13:17:17Z)
ConE: Cone Embeddings for Multi-Hop Reasoning over Knowledge Graphs [73.86041481470261]
Cone Embeddings (ConE) is the first geometry-based query embedding model that can handle conjunction, disjunction, and negation. ConE significantly outperforms existing state-of-the-art methods on benchmark datasets.
arXiv Detail & Related papers (2021-10-26T14:04:02Z)
GeoQA: A Geometric Question Answering Benchmark Towards Multimodal Numerical Reasoning [172.36214872466707]
We focus on solving geometric problems, which requires a comprehensive understanding of textual descriptions, visual diagrams, and theorem knowledge. We propose a Geometric Question Answering dataset GeoQA, containing 5,010 geometric problems with corresponding annotated programs.
arXiv Detail & Related papers (2021-05-30T12:34:17Z)
Finding Geometric Models by Clustering in the Consensus Space [61.65661010039768]
We propose a new algorithm for finding an unknown number of geometric models, e.g., homographies. We present a number of applications where the use of multiple geometric models improves accuracy. These include pose estimation from multiple generalized homographies; trajectory estimation of fast-moving objects.
arXiv Detail & Related papers (2021-03-25T14:35:07Z)
Tensor network models of AdS/qCFT [69.6561021616688]
We introduce the notion of a quasiperiodic conformal field theory (qCFT) We show that qCFT can be best understood as belonging to a paradigm of discrete holography.
arXiv Detail & Related papers (2020-04-08T18:00:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.