Do Larger Language Models Imply Better Reasoning? A Pretraining Scaling Law for Reasoning
- URL: http://arxiv.org/abs/2504.03635v1
- Date: Fri, 04 Apr 2025 17:57:22 GMT
- Title: Do Larger Language Models Imply Better Reasoning? A Pretraining Scaling Law for Reasoning
- Authors: Xinyi Wang, Shawn Tan, Mingyu Jin, William Yang Wang, Rameswar Panda, Yikang Shen,
- Abstract summary: We introduce a synthetic multihop reasoning environment designed to replicate the structure and distribution of real-world large-scale knowledge graphs.<n>Our reasoning task involves completing missing edges in the graph, which requires advanced multi-hop reasoning and mimics real-world reasoning scenarios.<n>To predict the optimal model size for a specific knowledge graph, we find an empirical scaling that linearly maps the knowledge graph search entropy to the optimal model size.
- Score: 89.17086632436363
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities across a wide range of tasks requiring complex reasoning. However, the effects of scaling on their reasoning abilities remain insufficiently understood. In this paper, we introduce a synthetic multihop reasoning environment designed to closely replicate the structure and distribution of real-world large-scale knowledge graphs. Our reasoning task involves completing missing edges in the graph, which requires advanced multi-hop reasoning and mimics real-world reasoning scenarios. To evaluate this, we pretrain language models (LMs) from scratch solely on triples from the incomplete graph and assess their ability to infer the missing edges. Interestingly, we observe that overparameterization can impair reasoning performance due to excessive memorization. We investigate different factors that affect this U-shaped loss curve, including graph structure, model size, and training steps. To predict the optimal model size for a specific knowledge graph, we find an empirical scaling that linearly maps the knowledge graph search entropy to the optimal model size. This work provides new insights into the relationship between scaling and reasoning in LLMs, shedding light on possible ways to optimize their performance for reasoning tasks.
Related papers
- Causal Graphs Meet Thoughts: Enhancing Complex Reasoning in Graph-Augmented LLMs [4.701165676405066]
It is critical not only to retrieve relevant information but also to provide causal reasoning and explainability.<n>This paper proposes a novel pipeline that filters large knowledge graphs to emphasize cause-effect edges.<n> Experiments on medical question-answering tasks show consistent gains, with up to a 10% absolute improvement.
arXiv Detail & Related papers (2025-01-24T19:31:06Z) - Revisiting the Graph Reasoning Ability of Large Language Models: Case Studies in Translation, Connectivity and Shortest Path [53.71787069694794]
We focus on the graph reasoning ability of Large Language Models (LLMs)<n>We revisit the ability of LLMs on three fundamental graph tasks: graph description translation, graph connectivity, and the shortest-path problem.<n>Our findings suggest that LLMs can fail to understand graph structures through text descriptions and exhibit varying performance for all these fundamental tasks.
arXiv Detail & Related papers (2024-08-18T16:26:39Z) - Understanding Reasoning Ability of Language Models From the Perspective of Reasoning Paths Aggregation [110.71955853831707]
We view LMs as deriving new conclusions by aggregating indirect reasoning paths seen at pre-training time.
We formalize the reasoning paths as random walk paths on the knowledge/reasoning graphs.
Experiments and analysis on multiple KG and CoT datasets reveal the effect of training on random walk paths.
arXiv Detail & Related papers (2024-02-05T18:25:51Z) - GraphLLM: Boosting Graph Reasoning Ability of Large Language Model [7.218768686958888]
GraphLLM is a pioneering end-to-end approach that integrates graph learning models with Large Language Models.
Our empirical evaluations across four fundamental graph reasoning tasks validate the effectiveness of GraphLLM.
The results exhibit a substantial average accuracy enhancement of 54.44%, alongside a noteworthy context reduction of 96.45%.
arXiv Detail & Related papers (2023-10-09T16:42:00Z) - GraphReason: Enhancing Reasoning Capabilities of Large Language Models through A Graph-Based Verification Approach [0.0]
Large Language Models (LLMs) have showcased impressive reasoning capabilities.
In this paper, we introduce a novel graph-based method to further augment the reasoning capabilities of LLMs.
arXiv Detail & Related papers (2023-08-18T03:12:59Z) - Beyond spectral gap (extended): The role of the topology in
decentralized learning [58.48291921602417]
In data-parallel optimization of machine learning models, workers collaborate to improve their estimates of the model.
Current theory does not explain that collaboration enables larger learning rates than training alone.
This paper aims to paint an accurate picture of sparsely-connected distributed optimization.
arXiv Detail & Related papers (2023-01-05T16:53:38Z) - CLEAR: Generative Counterfactual Explanations on Graphs [60.30009215290265]
We study the problem of counterfactual explanation generation on graphs.
A few studies have explored counterfactual explanations on graphs, but many challenges of this problem are still not well-addressed.
We propose a novel framework CLEAR which aims to generate counterfactual explanations on graphs for graph-level prediction models.
arXiv Detail & Related papers (2022-10-16T04:35:32Z) - Beyond spectral gap: The role of the topology in decentralized learning [58.48291921602417]
In data-parallel optimization of machine learning models, workers collaborate to improve their estimates of the model.
This paper aims to paint an accurate picture of sparsely-connected distributed optimization when workers share the same data distribution.
Our theory matches empirical observations in deep learning, and accurately describes the relative merits of different graph topologies.
arXiv Detail & Related papers (2022-06-07T08:19:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.