Related papers: GeoGPT4V: Towards Geometric Multi-modal Large Language Models with Geometric Image Generation

GeoGPT4V: Towards Geometric Multi-modal Large Language Models with Geometric Image Generation

URL: http://arxiv.org/abs/2406.11503v1
Date: Mon, 17 Jun 2024 13:04:27 GMT
Title: GeoGPT4V: Towards Geometric Multi-modal Large Language Models with Geometric Image Generation
Authors: Shihao Cai, Keqin Bao, Hangyu Guo, Jizhi Zhang, Jun Song, Bo Zheng,
Abstract summary: GPT-4 and GPT-4V are used to generate basic geometry problems with aligned text and images. We have produced a dataset of 4.9K geometry problems and combined it with 19K open-source data to form our GeoGPT4V dataset. Results demonstrate that the GeoGPT4V dataset significantly improves the geometry performance of various models on the MathVista and MathVision benchmarks.
Score: 15.931398242118073
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models have seen widespread adoption in math problem-solving. However, in geometry problems that usually require visual aids for better understanding, even the most advanced multi-modal models currently still face challenges in effectively using image information. High-quality data is crucial for enhancing the geometric capabilities of multi-modal models, yet existing open-source datasets and related efforts are either too challenging for direct model learning or suffer from misalignment between text and images. To overcome this issue, we introduce a novel pipeline that leverages GPT-4 and GPT-4V to generate relatively basic geometry problems with aligned text and images, facilitating model learning. We have produced a dataset of 4.9K geometry problems and combined it with 19K open-source data to form our GeoGPT4V dataset. Experimental results demonstrate that the GeoGPT4V dataset significantly improves the geometry performance of various models on the MathVista and MathVision benchmarks. The code is available at https://github.com/Lanyu0303/GeoGPT4V_Project

Related papers

TrustGeoGen: Scalable and Formal-Verified Data Engine for Trustworthy Multi-modal Geometric Problem Solving [66.0201510984171]
We propose a scalable data engine called TrustGeoGen for problem generation.<n>By formal verification, TrustGeoGen produces GeoTrust-200K dataset with guaranteed modality integrity.<n> Experiments reveal the state-of-the-art models achieve only 49.17% accuracy on GeoTrust-test.
arXiv Detail & Related papers (2025-04-22T10:45:23Z)
GeoUni: A Unified Model for Generating Geometry Diagrams, Problems and Problem Solutions [9.55713776359176]
We propose GeoUni, the first unified geometry expert model capable of generating problem solutions and diagrams within a single framework. With only 1.5B parameters, GeoUni achieves performance comparable to larger models such as DeepSeek-R1 with 671B parameters in geometric reasoning tasks. GeoUni also excels in generating precise geometric diagrams, surpassing both text-to-image models and unified models, including the GPT-4o image generation.
arXiv Detail & Related papers (2025-04-14T11:56:55Z)
GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training [45.42400674977197]
GeoX is a multi-modal large model focusing on geometric understanding and reasoning tasks.<n>We introduce unimodal pre-training to develop a diagram encoder and symbol decoder, enhancing the understanding of geometric images and corpora.<n>We propose a Generator-And-Sampler Transformer (GS-Former) to generate discriminative queries and eliminate uninformative representations from unevenly distributed geometric signals.
arXiv Detail & Related papers (2024-12-16T15:20:03Z)
Geo-LLaVA: A Large Multi-Modal Model for Solving Geometry Math Problems with Meta In-Context Learning [4.4615747404424395]
Geometry mathematics problems pose significant challenges for large language models (LLMs) We collect a geometry question-answer dataset by sourcing geometric data from Chinese high school education websites, referred to as GeoMath. We propose a Large Multi-modal Model (LMM) framework named Geo-LLaVA, which incorporates retrieval augmentation with supervised fine-tuning (SFT) in the training stage, called meta-training, and employs in-context learning (ICL) during inference to improve performance.
arXiv Detail & Related papers (2024-12-12T07:34:09Z)
Geometry Distributions [51.4061133324376]
We propose a novel geometric data representation that models geometry as distributions. Our approach uses diffusion models with a novel network architecture to learn surface point distributions. We evaluate our representation qualitatively and quantitatively across various object types, demonstrating its effectiveness in achieving high geometric fidelity.
arXiv Detail & Related papers (2024-11-25T04:06:48Z)
R-CoT: Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models [86.06825304372613]
We propose a two-stage Reverse Chain-of-Thought (R-CoT) geometry problem generation pipeline. First, we introduce GeoChain to produce high-fidelity geometric images and corresponding descriptions. We then design a Reverse A&Q method that reasons step-by-step based on the descriptions and generates questions in reverse from the reasoning results.
arXiv Detail & Related papers (2024-10-23T13:58:39Z)
GeoBiked: A Dataset with Geometric Features and Automated Labeling Techniques to Enable Deep Generative Models in Engineering Design [0.0]
GeoBiked is curated to contain 4 355 bicycle images, annotated with structural and technical features. We propose methods to automate data labeling by utilizing large-scale foundation models.
arXiv Detail & Related papers (2024-09-25T15:57:59Z)
AutoGeo: Automating Geometric Image Dataset Creation for Enhanced Geometry Understanding [18.223835101407637]
This paper introduces AutoGeo, a novel approach for automatically generating mathematical geometric images. By leveraging precisely defined geometric clauses, AutoGeo-100k contains a wide variety of geometric shapes. Experimental results indicate significant improvements in the model's ability in handling geometric images.
arXiv Detail & Related papers (2024-08-28T14:49:26Z)
GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation [65.33726478659304]
We introduce the Geometry-Aware Large Reconstruction Model (GeoLRM), an approach which can predict high-quality assets with 512k Gaussians and 21 input images in only 11 GB GPU memory. Previous works neglect the inherent sparsity of 3D structure and do not utilize explicit geometric relationships between 3D and 2D images. GeoLRM tackles these issues by incorporating a novel 3D-aware transformer structure that directly processes 3D points and uses deformable cross-attention mechanisms.
arXiv Detail & Related papers (2024-06-21T17:49:31Z)
GOLD: Geometry Problem Solver with Natural Language Description [7.9345421580482185]
We present the Geometry problem sOlver with natural Language Description (GOLD) model. GOLD enhances the extraction of geometric relations by separately processing symbols and geometric primitives within the diagram. It converts the extracted relations into natural language descriptions, efficiently utilizing large language models to solve geometry math problems.
arXiv Detail & Related papers (2024-05-01T13:00:51Z)
G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model [124.68242155098189]
Large language models (LLMs) have shown remarkable proficiency in human-level reasoning and generation capabilities. G-LLaVA demonstrates exceptional performance in solving geometric problems, significantly outperforming GPT-4-V on the MathVista benchmark with only 7B parameters.
arXiv Detail & Related papers (2023-12-18T17:36:20Z)
GeoQA: A Geometric Question Answering Benchmark Towards Multimodal Numerical Reasoning [172.36214872466707]
We focus on solving geometric problems, which requires a comprehensive understanding of textual descriptions, visual diagrams, and theorem knowledge. We propose a Geometric Question Answering dataset GeoQA, containing 5,010 geometric problems with corresponding annotated programs.
arXiv Detail & Related papers (2021-05-30T12:34:17Z)
Graph Signal Processing for Geometric Data and Beyond: Theory and Applications [55.81966207837108]
Graph Signal Processing (GSP) enables processing signals that reside on irregular domains. GSP methodologies for geometric data in a unified manner by bridging the connections between geometric data and graphs. Recently developed Graph Neural Networks (GNNs) interpret the operation of these networks from the perspective of GSP.
arXiv Detail & Related papers (2020-08-05T03:20:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.