Reasoning in Large Language Models: A Geometric Perspective
- URL: http://arxiv.org/abs/2407.02678v1
- Date: Tue, 2 Jul 2024 21:39:53 GMT
- Title: Reasoning in Large Language Models: A Geometric Perspective
- Authors: Romain Cosentino, Sarath Shekkizhar,
- Abstract summary: We explore the reasoning abilities of large language models (LLMs) through their geometrical understanding.
We establish a connection between the expressive power of LLMs and the density of their self-attention graphs.
- Score: 4.2909314120969855
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The advancement of large language models (LLMs) for real-world applications hinges critically on enhancing their reasoning capabilities. In this work, we explore the reasoning abilities of large language models (LLMs) through their geometrical understanding. We establish a connection between the expressive power of LLMs and the density of their self-attention graphs. Our analysis demonstrates that the density of these graphs defines the intrinsic dimension of the inputs to the MLP blocks. We demonstrate through theoretical analysis and toy examples that a higher intrinsic dimension implies a greater expressive capacity of the LLM. We further provide empirical evidence linking this geometric framework to recent advancements in methods aimed at enhancing the reasoning capabilities of LLMs.
Related papers
- Learning on Graphs with Large Language Models(LLMs): A Deep Dive into Model Robustness [39.57155321515097]
Large Language Models (LLMs) have demonstrated remarkable performance across various natural language processing tasks.
It remains unclear whether LLMs exhibit robustness in learning on graphs.
arXiv Detail & Related papers (2024-07-16T09:05:31Z) - SpaRC and SpaRP: Spatial Reasoning Characterization and Path Generation for Understanding Spatial Reasoning Capability of Large Language Models [70.01883340129204]
spatial reasoning is a crucial component of both biological and artificial intelligence.
We present a comprehensive study of the capability of current state-of-the-art large language models (LLMs) on spatial reasoning.
arXiv Detail & Related papers (2024-06-07T01:06:34Z) - Characterizing Truthfulness in Large Language Model Generations with
Local Intrinsic Dimension [63.330262740414646]
We study how to characterize and predict the truthfulness of texts generated from large language models (LLMs)
We suggest investigating internal activations and quantifying LLM's truthfulness using the local intrinsic dimension (LID) of model activations.
arXiv Detail & Related papers (2024-02-28T04:56:21Z) - Beyond Lines and Circles: Unveiling the Geometric Reasoning Gap in Large
Language Models [28.819559978685806]
Large Language Models (LLMs) demonstrate ever-increasing abilities in mathematical and algorithmic tasks, yet their geometric reasoning skills are underexplored.
We investigate LLMs' abilities in constructive geometric problem-solving one of the most fundamental steps in the development of human mathematical reasoning.
Our work reveals notable challenges that the state-of-the-art LLMs face in this domain despite many successes in similar areas.
arXiv Detail & Related papers (2024-02-06T10:37:21Z) - A Principled Framework for Knowledge-enhanced Large Language Model [58.1536118111993]
Large Language Models (LLMs) are versatile, yet they often falter in tasks requiring deep and reliable reasoning.
This paper introduces a rigorously designed framework for creating LLMs that effectively anchor knowledge and employ a closed-loop reasoning process.
arXiv Detail & Related papers (2023-11-18T18:10:02Z) - Disentangled Representation Learning with Large Language Models for
Text-Attributed Graphs [57.052160123387104]
We present the Disentangled Graph-Text Learner (DGTL) model, which is able to enhance the reasoning and predicting capabilities of LLMs for TAGs.
Our proposed DGTL model incorporates graph structure information through tailored disentangled graph neural network (GNN) layers.
Experimental evaluations demonstrate the effectiveness of the proposed DGTL model on achieving superior or comparable performance over state-of-the-art baselines.
arXiv Detail & Related papers (2023-10-27T14:00:04Z) - Why Can Large Language Models Generate Correct Chain-of-Thoughts? [10.888196404348093]
We introduce a two-level hierarchical graphical model tailored for natural language generation.
We establish a compelling geometrical convergence rate that gauges the likelihood of an LLM-generated chain of thoughts.
arXiv Detail & Related papers (2023-10-20T15:09:46Z) - Towards LogiGLUE: A Brief Survey and A Benchmark for Analyzing Logical Reasoning Capabilities of Language Models [56.34029644009297]
Large language models (LLMs) have demonstrated the ability to overcome various limitations of formal Knowledge Representation (KR) systems.
LLMs excel most in abductive reasoning, followed by deductive reasoning, while they are least effective at inductive reasoning.
We study single-task training, multi-task training, and "chain-of-thought" knowledge distillation fine-tuning technique to assess the performance of model.
arXiv Detail & Related papers (2023-10-02T01:00:50Z) - GraphReason: Enhancing Reasoning Capabilities of Large Language Models through A Graph-Based Verification Approach [0.0]
Large Language Models (LLMs) have showcased impressive reasoning capabilities.
In this paper, we introduce a novel graph-based method to further augment the reasoning capabilities of LLMs.
arXiv Detail & Related papers (2023-08-18T03:12:59Z) - Large Language Models are In-Context Semantic Reasoners rather than
Symbolic Reasoners [75.85554779782048]
Large Language Models (LLMs) have excited the natural language and machine learning community over recent years.
Despite of numerous successful applications, the underlying mechanism of such in-context capabilities still remains unclear.
In this work, we hypothesize that the learned textitsemantics of language tokens do the most heavy lifting during the reasoning process.
arXiv Detail & Related papers (2023-05-24T07:33:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.