GeoGPT4V: Towards Geometric Multi-modal Large Language Models with Geometric Image Generation
- URL: http://arxiv.org/abs/2406.11503v1
- Date: Mon, 17 Jun 2024 13:04:27 GMT
- Title: GeoGPT4V: Towards Geometric Multi-modal Large Language Models with Geometric Image Generation
- Authors: Shihao Cai, Keqin Bao, Hangyu Guo, Jizhi Zhang, Jun Song, Bo Zheng,
- Abstract summary: GPT-4 and GPT-4V are used to generate basic geometry problems with aligned text and images.
We have produced a dataset of 4.9K geometry problems and combined it with 19K open-source data to form our GeoGPT4V dataset.
Results demonstrate that the GeoGPT4V dataset significantly improves the geometry performance of various models on the MathVista and MathVision benchmarks.
- Score: 15.931398242118073
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models have seen widespread adoption in math problem-solving. However, in geometry problems that usually require visual aids for better understanding, even the most advanced multi-modal models currently still face challenges in effectively using image information. High-quality data is crucial for enhancing the geometric capabilities of multi-modal models, yet existing open-source datasets and related efforts are either too challenging for direct model learning or suffer from misalignment between text and images. To overcome this issue, we introduce a novel pipeline that leverages GPT-4 and GPT-4V to generate relatively basic geometry problems with aligned text and images, facilitating model learning. We have produced a dataset of 4.9K geometry problems and combined it with 19K open-source data to form our GeoGPT4V dataset. Experimental results demonstrate that the GeoGPT4V dataset significantly improves the geometry performance of various models on the MathVista and MathVision benchmarks. The code is available at https://github.com/Lanyu0303/GeoGPT4V_Project
Related papers
- GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation [65.33726478659304]
We introduce the Geometry-Aware Large Reconstruction Model (GeoLRM), an approach which can predict high-quality assets with 512k Gaussians and 21 input images in only 11 GB GPU memory.
Previous works neglect the inherent sparsity of 3D structure and do not utilize explicit geometric relationships between 3D and 2D images.
GeoLRM tackles these issues by incorporating a novel 3D-aware transformer structure that directly processes 3D points and uses deformable cross-attention mechanisms.
arXiv Detail & Related papers (2024-06-21T17:49:31Z) - GeoGen: Geometry-Aware Generative Modeling via Signed Distance Functions [22.077366472693395]
We introduce a new generative approach for synthesizing 3D geometry and images from single-view collections.
By employing volumetric rendering using neural radiance fields, they inherit a key limitation: the generated geometry is noisy and unconstrained.
We propose GeoGen, a new SDF-based 3D generative model trained in an end-to-end manner.
arXiv Detail & Related papers (2024-06-06T17:00:10Z) - GOLD: Geometry Problem Solver with Natural Language Description [7.9345421580482185]
We present the Geometry problem sOlver with natural Language Description (GOLD) model.
GOLD enhances the extraction of geometric relations by separately processing symbols and geometric primitives within the diagram.
It converts the extracted relations into natural language descriptions, efficiently utilizing large language models to solve geometry math problems.
arXiv Detail & Related papers (2024-05-01T13:00:51Z) - A Survey of Geometric Graph Neural Networks: Data Structures, Models and
Applications [67.33002207179923]
This paper presents a survey of data structures, models, and applications related to geometric GNNs.
We provide a unified view of existing models from the geometric message passing perspective.
We also summarize the applications as well as the related datasets to facilitate later research for methodology development and experimental evaluation.
arXiv Detail & Related papers (2024-03-01T12:13:04Z) - G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model [124.68242155098189]
Large language models (LLMs) have shown remarkable proficiency in human-level reasoning and generation capabilities.
G-LLaVA demonstrates exceptional performance in solving geometric problems, significantly outperforming GPT-4-V on the MathVista benchmark with only 7B parameters.
arXiv Detail & Related papers (2023-12-18T17:36:20Z) - UniGeo: Unifying Geometry Logical Reasoning via Reformulating
Mathematical Expression [127.68780714438103]
Two main geometry problems: calculation and proving, are usually treated as two specific tasks.
We construct a large-scale Unified Geometry problem benchmark, UniGeo, which contains 4,998 calculation problems and 9,543 proving problems.
We also present a unified multi-task Geometric Transformer framework, Geoformer, to tackle calculation and proving problems simultaneously.
arXiv Detail & Related papers (2022-12-06T04:37:51Z) - GeoQA: A Geometric Question Answering Benchmark Towards Multimodal
Numerical Reasoning [172.36214872466707]
We focus on solving geometric problems, which requires a comprehensive understanding of textual descriptions, visual diagrams, and theorem knowledge.
We propose a Geometric Question Answering dataset GeoQA, containing 5,010 geometric problems with corresponding annotated programs.
arXiv Detail & Related papers (2021-05-30T12:34:17Z) - Graph Signal Processing for Geometric Data and Beyond: Theory and
Applications [55.81966207837108]
Graph Signal Processing (GSP) enables processing signals that reside on irregular domains.
GSP methodologies for geometric data in a unified manner by bridging the connections between geometric data and graphs.
Recently developed Graph Neural Networks (GNNs) interpret the operation of these networks from the perspective of GSP.
arXiv Detail & Related papers (2020-08-05T03:20:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.