Navigate Complex Physical Worlds via Geometrically Constrained LLM
- URL: http://arxiv.org/abs/2410.17529v1
- Date: Wed, 23 Oct 2024 03:14:07 GMT
- Title: Navigate Complex Physical Worlds via Geometrically Constrained LLM
- Authors: Yongqiang Huang, Wentao Ye, Liyao Li, Junbo Zhao,
- Abstract summary: The study introduces a set of geometric conventions and develops a workflow based on multi-layer graphs and multi-agent system frameworks.
The study employs a genetic algorithm, inspired by large-scale model knowledge, to solve geometric constraint problems.
- Score: 10.89488333922071
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This study investigates the potential of Large Language Models (LLMs) for reconstructing and constructing the physical world solely based on textual knowledge. It explores the impact of model performance on spatial understanding abilities. To enhance the comprehension of geometric and spatial relationships in the complex physical world, the study introduces a set of geometric conventions and develops a workflow based on multi-layer graphs and multi-agent system frameworks. It examines how LLMs achieve multi-step and multi-objective geometric inference in a spatial environment using multi-layer graphs under unified geometric conventions. Additionally, the study employs a genetic algorithm, inspired by large-scale model knowledge, to solve geometric constraint problems. In summary, this work innovatively explores the feasibility of using text-based LLMs as physical world builders and designs a workflow to enhance their capabilities.
Related papers
- Enhancing the Geometric Problem-Solving Ability of Multimodal LLMs via Symbolic-Neural Integration [57.95306827012784]
We propose GeoGen, a pipeline that can automatically generate step-wise reasoning paths for geometry diagrams.
By leveraging the precise symbolic reasoning, textbfGeoGen produces large-scale, high-quality question-answer pairs.
We train textbfGeoLogic, a Large Language Model (LLM), using synthetic data generated by GeoGen.
arXiv Detail & Related papers (2025-04-17T09:13:46Z) - MATHGLANCE: Multimodal Large Language Models Do Not Know Where to Look in Mathematical Diagrams [65.02628814094639]
Diagrams serve as a fundamental form of visual language, representing complex concepts and their inter-relationships through structured symbols, shapes, and spatial arrangements.
Current benchmarks conflate perceptual and reasoning tasks, making it difficult to assess whether Multimodal Large Language Models genuinely understand mathematical diagrams beyond superficial pattern recognition.
We introduce MATHGLANCE, a benchmark specifically designed to isolate and evaluate mathematical perception in MLLMs.
We construct GeoPeP, a perception-oriented dataset of 200K structured geometry image-text annotated with geometric primitives and precise spatial relationships.
arXiv Detail & Related papers (2025-03-26T17:30:41Z) - OmniGeo: Towards a Multimodal Large Language Models for Geospatial Artificial Intelligence [51.0456395687016]
multimodal large language models (LLMs) have opened new frontiers in artificial intelligence.
We propose a MLLM (OmniGeo) tailored to geospatial applications.
By combining the strengths of natural language understanding and spatial reasoning, our model enhances the ability of instruction following and the accuracy of GeoAI systems.
arXiv Detail & Related papers (2025-03-20T16:45:48Z) - Do Large Language Models Truly Understand Geometric Structures? [15.915781154075615]
We introduce the GeomRel dataset to evaluate large language models' understanding of geometric structures.
We propose the Geometry Chain-of-Thought (GeoCoT) method, which enhances LLMs' ability to identify geometric relationships.
arXiv Detail & Related papers (2025-01-23T15:52:34Z) - GePBench: Evaluating Fundamental Geometric Perception for Multimodal Large Language Models [34.647839550142834]
We introduce GePBench, a novel benchmark designed to assess the geometric perception abilities of MLLMs.
Our evaluations reveal that current state-of-the-art MLLMs exhibit significant deficiencies in geometric perception tasks.
We show that models trained with GePBench data demonstrate substantial improvements on a wide range of benchmark tasks.
arXiv Detail & Related papers (2024-12-30T16:01:43Z) - Geometry Distributions [51.4061133324376]
We propose a novel geometric data representation that models geometry as distributions.
Our approach uses diffusion models with a novel network architecture to learn surface point distributions.
We evaluate our representation qualitatively and quantitatively across various object types, demonstrating its effectiveness in achieving high geometric fidelity.
arXiv Detail & Related papers (2024-11-25T04:06:48Z) - Exploring the Alignment Landscape: LLMs and Geometric Deep Models in Protein Representation [57.59506688299817]
Latent representation alignment is used to map embeddings from different modalities into a shared space, often aligned with the embedding space of large language models (LLMs)
Preliminary protein-focused large language models (MLLMs) have emerged, but they have predominantly relied on approaches lacking a fundamental understanding of optimal alignment practices across representations.
In this study, we explore the alignment of multimodal representations between LLMs and Geometric Deep Models (GDMs) in the protein domain.
Our work examines alignment factors from both model and protein perspectives, identifying challenges in current alignment methodologies and proposing strategies to improve the alignment process.
arXiv Detail & Related papers (2024-11-08T04:15:08Z) - Diagram Formalization Enhanced Multi-Modal Geometry Problem Solver [11.69164802295844]
We introduce a new framework that integrates visual features, geometric formal language, and natural language representations.
We propose a novel synthetic data approach and create a large-scale geometric dataset, SynthGeo228K, annotated with both formal and natural language captions.
Our framework improves MLLMs' ability to process geometric diagrams and extends their application to open-ended tasks on the formalgeo7k dataset.
arXiv Detail & Related papers (2024-09-06T12:11:06Z) - Retrieval-Enhanced Machine Learning: Synthesis and Opportunities [60.34182805429511]
Retrieval-enhancement can be extended to a broader spectrum of machine learning (ML)
This work introduces a formal framework of this paradigm, Retrieval-Enhanced Machine Learning (REML), by synthesizing the literature in various domains in ML with consistent notations which is missing from the current literature.
The goal of this work is to equip researchers across various disciplines with a comprehensive, formally structured framework of retrieval-enhanced models, thereby fostering interdisciplinary future research.
arXiv Detail & Related papers (2024-07-17T20:01:21Z) - Beyond Lines and Circles: Unveiling the Geometric Reasoning Gap in Large
Language Models [28.819559978685806]
Large Language Models (LLMs) demonstrate ever-increasing abilities in mathematical and algorithmic tasks, yet their geometric reasoning skills are underexplored.
We investigate LLMs' abilities in constructive geometric problem-solving one of the most fundamental steps in the development of human mathematical reasoning.
Our work reveals notable challenges that the state-of-the-art LLMs face in this domain despite many successes in similar areas.
arXiv Detail & Related papers (2024-02-06T10:37:21Z) - When Large Language Models Meet Evolutionary Algorithms: Potential Enhancements and Challenges [50.280704114978384]
Pre-trained large language models (LLMs) exhibit powerful capabilities for generating natural text.
Evolutionary algorithms (EAs) can discover diverse solutions to complex real-world problems.
arXiv Detail & Related papers (2024-01-19T05:58:30Z) - G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model [124.68242155098189]
Large language models (LLMs) have shown remarkable proficiency in human-level reasoning and generation capabilities.
G-LLaVA demonstrates exceptional performance in solving geometric problems, significantly outperforming GPT-4-V on the MathVista benchmark with only 7B parameters.
arXiv Detail & Related papers (2023-12-18T17:36:20Z) - The Efficiency Spectrum of Large Language Models: An Algorithmic Survey [54.19942426544731]
The rapid growth of Large Language Models (LLMs) has been a driving force in transforming various domains.
This paper examines the multi-faceted dimensions of efficiency essential for the end-to-end algorithmic development of LLMs.
arXiv Detail & Related papers (2023-12-01T16:00:25Z) - MechGPT, a language-based strategy for mechanics and materials modeling
that connects knowledge across scales, disciplines and modalities [0.0]
We use a Large Language Model (LLM) to distill question-answer pairs from raw sources followed by fine-tuning.
The resulting MechGPT LLM foundation model is used in a series of computational experiments to explore its capacity for knowledge retrieval, various language tasks, hypothesis generation, and connecting knowledge across disparate areas.
arXiv Detail & Related papers (2023-10-16T14:29:35Z) - Evaluating the Effectiveness of Large Language Models in Representing
Textual Descriptions of Geometry and Spatial Relations [2.8935588665357086]
This research focuses on assessing the ability of large language models (LLMs) in representing geometries and their spatial relations.
We utilize LLMs including GPT-2 and BERT to encode the well-known text (WKT) format of geometries and then feed their embeddings into classifiers and regressors.
Experiments demonstrate that while the LLMs-generated embeddings can preserve geometry types and capture some spatial relations (up to 73% accuracy), challenges remain in estimating numeric values and retrieving spatially related objects.
arXiv Detail & Related papers (2023-07-05T03:50:08Z) - The Geometry of Self-supervised Learning Models and its Impact on
Transfer Learning [62.601681746034956]
Self-supervised learning (SSL) has emerged as a desirable paradigm in computer vision.
We propose a data-driven geometric strategy to analyze different SSL models using local neighborhoods in the feature space induced by each.
arXiv Detail & Related papers (2022-09-18T18:15:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.