GeoGPT4V: Towards Geometric Multi-modal Large Language Models with Geometric Image Generation
- URL: http://arxiv.org/abs/2406.11503v1
- Date: Mon, 17 Jun 2024 13:04:27 GMT
- Title: GeoGPT4V: Towards Geometric Multi-modal Large Language Models with Geometric Image Generation
- Authors: Shihao Cai, Keqin Bao, Hangyu Guo, Jizhi Zhang, Jun Song, Bo Zheng,
- Abstract summary: GPT-4 and GPT-4V are used to generate basic geometry problems with aligned text and images.
We have produced a dataset of 4.9K geometry problems and combined it with 19K open-source data to form our GeoGPT4V dataset.
Results demonstrate that the GeoGPT4V dataset significantly improves the geometry performance of various models on the MathVista and MathVision benchmarks.
- Score: 15.931398242118073
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models have seen widespread adoption in math problem-solving. However, in geometry problems that usually require visual aids for better understanding, even the most advanced multi-modal models currently still face challenges in effectively using image information. High-quality data is crucial for enhancing the geometric capabilities of multi-modal models, yet existing open-source datasets and related efforts are either too challenging for direct model learning or suffer from misalignment between text and images. To overcome this issue, we introduce a novel pipeline that leverages GPT-4 and GPT-4V to generate relatively basic geometry problems with aligned text and images, facilitating model learning. We have produced a dataset of 4.9K geometry problems and combined it with 19K open-source data to form our GeoGPT4V dataset. Experimental results demonstrate that the GeoGPT4V dataset significantly improves the geometry performance of various models on the MathVista and MathVision benchmarks. The code is available at https://github.com/Lanyu0303/GeoGPT4V_Project
Related papers
- Geometry Distributions [51.4061133324376]
We propose a novel geometric data representation that models geometry as distributions.
Our approach uses diffusion models with a novel network architecture to learn surface point distributions.
We evaluate our representation qualitatively and quantitatively across various object types, demonstrating its effectiveness in achieving high geometric fidelity.
arXiv Detail & Related papers (2024-11-25T04:06:48Z) - R-CoT: Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models [86.06825304372613]
We propose a two-stage Reverse Chain-of-Thought (R-CoT) geometry problem generation pipeline.
First, we introduce GeoChain to produce high-fidelity geometric images and corresponding descriptions.
We then design a Reverse A&Q method that reasons step-by-step based on the descriptions and generates questions in reverse from the reasoning results.
arXiv Detail & Related papers (2024-10-23T13:58:39Z) - GeoBiked: A Dataset with Geometric Features and Automated Labeling Techniques to Enable Deep Generative Models in Engineering Design [0.0]
GeoBiked is curated to contain 4 355 bicycle images, annotated with structural and technical features.
We propose methods to automate data labeling by utilizing large-scale foundation models.
arXiv Detail & Related papers (2024-09-25T15:57:59Z) - AutoGeo: Automating Geometric Image Dataset Creation for Enhanced Geometry Understanding [18.223835101407637]
This paper introduces AutoGeo, a novel approach for automatically generating mathematical geometric images.
By leveraging precisely defined geometric clauses, AutoGeo-100k contains a wide variety of geometric shapes.
Experimental results indicate significant improvements in the model's ability in handling geometric images.
arXiv Detail & Related papers (2024-08-28T14:49:26Z) - GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation [65.33726478659304]
We introduce the Geometry-Aware Large Reconstruction Model (GeoLRM), an approach which can predict high-quality assets with 512k Gaussians and 21 input images in only 11 GB GPU memory.
Previous works neglect the inherent sparsity of 3D structure and do not utilize explicit geometric relationships between 3D and 2D images.
GeoLRM tackles these issues by incorporating a novel 3D-aware transformer structure that directly processes 3D points and uses deformable cross-attention mechanisms.
arXiv Detail & Related papers (2024-06-21T17:49:31Z) - GOLD: Geometry Problem Solver with Natural Language Description [7.9345421580482185]
We present the Geometry problem sOlver with natural Language Description (GOLD) model.
GOLD enhances the extraction of geometric relations by separately processing symbols and geometric primitives within the diagram.
It converts the extracted relations into natural language descriptions, efficiently utilizing large language models to solve geometry math problems.
arXiv Detail & Related papers (2024-05-01T13:00:51Z) - G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model [124.68242155098189]
Large language models (LLMs) have shown remarkable proficiency in human-level reasoning and generation capabilities.
G-LLaVA demonstrates exceptional performance in solving geometric problems, significantly outperforming GPT-4-V on the MathVista benchmark with only 7B parameters.
arXiv Detail & Related papers (2023-12-18T17:36:20Z) - GeoQA: A Geometric Question Answering Benchmark Towards Multimodal
Numerical Reasoning [172.36214872466707]
We focus on solving geometric problems, which requires a comprehensive understanding of textual descriptions, visual diagrams, and theorem knowledge.
We propose a Geometric Question Answering dataset GeoQA, containing 5,010 geometric problems with corresponding annotated programs.
arXiv Detail & Related papers (2021-05-30T12:34:17Z) - Graph Signal Processing for Geometric Data and Beyond: Theory and
Applications [55.81966207837108]
Graph Signal Processing (GSP) enables processing signals that reside on irregular domains.
GSP methodologies for geometric data in a unified manner by bridging the connections between geometric data and graphs.
Recently developed Graph Neural Networks (GNNs) interpret the operation of these networks from the perspective of GSP.
arXiv Detail & Related papers (2020-08-05T03:20:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.