Related papers: Do Larger Language Models Imply Better Reasoning? A Pretraining Scaling Law for Reasoning

Do Larger Language Models Imply Better Reasoning? A Pretraining Scaling Law for Reasoning

URL: http://arxiv.org/abs/2504.03635v1
Date: Fri, 04 Apr 2025 17:57:22 GMT
Title: Do Larger Language Models Imply Better Reasoning? A Pretraining Scaling Law for Reasoning
Authors: Xinyi Wang, Shawn Tan, Mingyu Jin, William Yang Wang, Rameswar Panda, Yikang Shen,
Abstract summary: We introduce a synthetic multihop reasoning environment designed to replicate the structure and distribution of real-world large-scale knowledge graphs.<n>Our reasoning task involves completing missing edges in the graph, which requires advanced multi-hop reasoning and mimics real-world reasoning scenarios.<n>To predict the optimal model size for a specific knowledge graph, we find an empirical scaling that linearly maps the knowledge graph search entropy to the optimal model size.
Score: 89.17086632436363
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities across a wide range of tasks requiring complex reasoning. However, the effects of scaling on their reasoning abilities remain insufficiently understood. In this paper, we introduce a synthetic multihop reasoning environment designed to closely replicate the structure and distribution of real-world large-scale knowledge graphs. Our reasoning task involves completing missing edges in the graph, which requires advanced multi-hop reasoning and mimics real-world reasoning scenarios. To evaluate this, we pretrain language models (LMs) from scratch solely on triples from the incomplete graph and assess their ability to infer the missing edges. Interestingly, we observe that overparameterization can impair reasoning performance due to excessive memorization. We investigate different factors that affect this U-shaped loss curve, including graph structure, model size, and training steps. To predict the optimal model size for a specific knowledge graph, we find an empirical scaling that linearly maps the knowledge graph search entropy to the optimal model size. This work provides new insights into the relationship between scaling and reasoning in LLMs, shedding light on possible ways to optimize their performance for reasoning tasks.

Related papers

Let Me Think! A Long Chain-of-Thought Can Be Worth Exponentially Many Short Ones [19.390296419822562]
We show the existence of reasoning settings where sequential scaling offers an exponential advantage over parallel scaling.<n>We validate our theoretical findings with comprehensive experiments across a range of language models.
arXiv Detail & Related papers (2025-05-27T23:23:34Z)
G1: Teaching LLMs to Reason on Graphs with Reinforcement Learning [58.73279333365234]
Reinforcement Learning (RL) on synthetic graph-theoretic tasks can significantly scale graph reasoning abilities.<n>With RL on Erdos, G1 obtains substantial improvements in graph reasoning, where our finetuned 3B model even outperforms Qwen2.5-72B-Instruct (24x size)<n>Our findings offer an efficient, scalable path for building strong graph reasoners by finetuning LLMs with RL on graph-theoretic tasks.
arXiv Detail & Related papers (2025-05-24T04:33:41Z)
Causal Graphs Meet Thoughts: Enhancing Complex Reasoning in Graph-Augmented LLMs [4.701165676405066]
It is critical not only to retrieve relevant information but also to provide causal reasoning and explainability.<n>This paper proposes a novel pipeline that filters large knowledge graphs to emphasize cause-effect edges.<n> Experiments on medical question-answering tasks show consistent gains, with up to a 10% absolute improvement.
arXiv Detail & Related papers (2025-01-24T19:31:06Z)
What Do LLMs Need to Understand Graphs: A Survey of Parametric Representation of Graphs [69.48708136448694]
Large language models (LLMs) are reorganizing in the AI community for their expected reasoning and inference abilities.<n>We believe this kind of parametric representation of graphs, graph laws, can be a solution for making LLMs understand graph data as the input.
arXiv Detail & Related papers (2024-10-16T00:01:31Z)
Revisiting the Graph Reasoning Ability of Large Language Models: Case Studies in Translation, Connectivity and Shortest Path [53.71787069694794]
We focus on the graph reasoning ability of Large Language Models (LLMs)<n>We revisit the ability of LLMs on three fundamental graph tasks: graph description translation, graph connectivity, and the shortest-path problem.<n>Our findings suggest that LLMs can fail to understand graph structures through text descriptions and exhibit varying performance for all these fundamental tasks.
arXiv Detail & Related papers (2024-08-18T16:26:39Z)
Understanding Reasoning Ability of Language Models From the Perspective of Reasoning Paths Aggregation [110.71955853831707]
We view LMs as deriving new conclusions by aggregating indirect reasoning paths seen at pre-training time. We formalize the reasoning paths as random walk paths on the knowledge/reasoning graphs. Experiments and analysis on multiple KG and CoT datasets reveal the effect of training on random walk paths.
arXiv Detail & Related papers (2024-02-05T18:25:51Z)
GraphLLM: Boosting Graph Reasoning Ability of Large Language Model [7.218768686958888]
GraphLLM is a pioneering end-to-end approach that integrates graph learning models with Large Language Models. Our empirical evaluations across four fundamental graph reasoning tasks validate the effectiveness of GraphLLM. The results exhibit a substantial average accuracy enhancement of 54.44%, alongside a noteworthy context reduction of 96.45%.
arXiv Detail & Related papers (2023-10-09T16:42:00Z)
GraphReason: Enhancing Reasoning Capabilities of Large Language Models through A Graph-Based Verification Approach [0.0]
Large Language Models (LLMs) have showcased impressive reasoning capabilities. In this paper, we introduce a novel graph-based method to further augment the reasoning capabilities of LLMs.
arXiv Detail & Related papers (2023-08-18T03:12:59Z)
Beyond spectral gap (extended): The role of the topology in decentralized learning [58.48291921602417]
In data-parallel optimization of machine learning models, workers collaborate to improve their estimates of the model. Current theory does not explain that collaboration enables larger learning rates than training alone. This paper aims to paint an accurate picture of sparsely-connected distributed optimization.
arXiv Detail & Related papers (2023-01-05T16:53:38Z)
CLEAR: Generative Counterfactual Explanations on Graphs [60.30009215290265]
We study the problem of counterfactual explanation generation on graphs. A few studies have explored counterfactual explanations on graphs, but many challenges of this problem are still not well-addressed. We propose a novel framework CLEAR which aims to generate counterfactual explanations on graphs for graph-level prediction models.
arXiv Detail & Related papers (2022-10-16T04:35:32Z)
Beyond spectral gap: The role of the topology in decentralized learning [58.48291921602417]
In data-parallel optimization of machine learning models, workers collaborate to improve their estimates of the model. This paper aims to paint an accurate picture of sparsely-connected distributed optimization when workers share the same data distribution. Our theory matches empirical observations in deep learning, and accurately describes the relative merits of different graph topologies.
arXiv Detail & Related papers (2022-06-07T08:19:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.