Related papers: Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers

Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers

URL: http://arxiv.org/abs/2504.20752v1
Date: Tue, 29 Apr 2025 13:33:29 GMT
Title: Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers
Authors: Roman Abramov, Felix Steinbauer, Gjergji Kasneci,
Abstract summary: We extend grokking to real-world factual data and address the challenge of dataset sparsity.<n>Surprisingly, we find that even factually incorrect synthetic data can strengthen emergent reasoning circuits.<n>Our approach achieves up to 95-100% accuracy on multi-hop reasoning benchmarks.
Score: 9.50669909278749
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Transformers have achieved great success in numerous NLP tasks but continue to exhibit notable gaps in multi-step factual reasoning, especially when real-world knowledge is sparse. Recent advances in grokking have demonstrated that neural networks can transition from memorizing to perfectly generalizing once they detect underlying logical patterns - yet these studies have primarily used small, synthetic tasks. In this paper, for the first time, we extend grokking to real-world factual data and address the challenge of dataset sparsity by augmenting existing knowledge graphs with carefully designed synthetic data to raise the ratio $\phi_r$ of inferred facts to atomic facts above the threshold required for grokking. Surprisingly, we find that even factually incorrect synthetic data can strengthen emergent reasoning circuits rather than degrade accuracy, as it forces the model to rely on relational structure rather than memorization. When evaluated on multi-hop reasoning benchmarks, our approach achieves up to 95-100% accuracy on 2WikiMultiHopQA - substantially improving over strong baselines and matching or exceeding current state-of-the-art results. We further provide an in-depth analysis of how increasing $\phi_r$ drives the formation of generalizing circuits inside Transformers. Our findings suggest that grokking-based data augmentation can unlock implicit multi-hop reasoning capabilities, opening the door to more robust and interpretable factual reasoning in large-scale language models.

Related papers

Mixture of Parrots: Experts improve memorization more than reasoning [72.445819694797]
We show that as we increase the number of experts, the memorization performance consistently increases while the reasoning capabilities saturate.<n>We find that increasing the number of experts helps solve knowledge-intensive tasks, but fails to yield the same benefits for reasoning tasks.
arXiv Detail & Related papers (2024-10-24T17:54:41Z)
Generalization v.s. Memorization: Tracing Language Models' Capabilities Back to Pretraining Data [76.90128359866462]
We introduce an extended concept of memorization, distributional memorization, which measures the correlation between the output probabilities and the pretraining data frequency.<n>This study demonstrates that memorization plays a larger role in simpler, knowledge-intensive tasks, while generalization is the key for harder, reasoning-based tasks.
arXiv Detail & Related papers (2024-07-20T21:24:40Z)
Discovering physical laws with parallel combinatorial tree search [57.05912962368898]
Symbolic regression plays a crucial role in scientific research thanks to its capability of discovering concise and interpretable mathematical expressions from data.<n>Existing algorithms have faced a critical bottleneck of accuracy and efficiency over a decade.<n>We introduce a parallel tree search (PCTS) model to efficiently distill generic mathematical expressions from limited data.
arXiv Detail & Related papers (2024-07-05T10:41:15Z)
Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis [63.66763657191476]
We show that efficient numerical training and inference algorithms as low-rank computation have impressive performance for learning Transformer-based adaption. We analyze how magnitude-based models affect generalization while improving adaption. We conclude that proper magnitude-based has a slight on the testing performance.
arXiv Detail & Related papers (2024-06-24T23:00:58Z)
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization [22.033370572209744]
We study whether transformers can learn to implicitly reason over parametric knowledge. We focus on two representative reasoning types, composition and comparison. We find that transformers can learn implicit reasoning, but only through grokking.
arXiv Detail & Related papers (2024-05-23T21:42:19Z)
FT2Ra: A Fine-Tuning-Inspired Approach to Retrieval-Augmented Code Completion [24.964973946366335]
We develop a novel retrieval-based method, FT2Ra, which aims to mimic genuine fine-tuning. FT2Ra achieves a 4.29% improvement in accuracy compared to the best baseline method on UniXcoder.
arXiv Detail & Related papers (2024-04-02T01:42:15Z)
Deep Generative Symbolic Regression [83.04219479605801]
Symbolic regression aims to discover concise closed-form mathematical equations from data. Existing methods, ranging from search to reinforcement learning, fail to scale with the number of input variables. We propose an instantiation of our framework, Deep Generative Symbolic Regression.
arXiv Detail & Related papers (2023-12-30T17:05:31Z)
EXPLAIN, EDIT, GENERATE: Rationale-Sensitive Counterfactual Data Augmentation for Multi-hop Fact Verification [28.453817513380276]
We develop a rationale-sensitive method to generate linguistically diverse and label-flipping counterfactuals. In specific, the diverse and fluent counterfactuals are generated via an Explain-Edit-Generate architecture. Experimental results show that the proposed approach outperforms the SOTA baselines.
arXiv Detail & Related papers (2023-10-23T02:39:14Z)
Interpretability at Scale: Identifying Causal Mechanisms in Alpaca [62.65877150123775]
We use Boundless DAS to efficiently search for interpretable causal structure in large language models while they follow instructions. Our findings mark a first step toward faithfully understanding the inner-workings of our ever-growing and most widely deployed language models.
arXiv Detail & Related papers (2023-05-15T17:15:40Z)
Pushing the Limits of Rule Reasoning in Transformers through Natural Language Satisfiability [30.01308882849197]
We propose a new methodology for creating challenging algorithmic reasoning datasets. Key idea is to draw insights from empirical sampling of hard propositional SAT problems and from complexity-theoretic studies of language. We find that current transformers, given sufficient training data, are surprisingly robust at solving the resulting NLSat problems.
arXiv Detail & Related papers (2021-12-16T17:47:20Z)
On the Robustness and Generalization of Deep Learning Driven Full Waveform Inversion [2.5382095320488665]
Full Waveform Inversion (FWI) is commonly epitomized as an image-to-image translation task. Despite being trained with synthetic data, the deep learning-driven FWI is expected to perform well when evaluated with sufficient real-world data. We study such properties by asking: how robust are these deep neural networks and how do they generalize?
arXiv Detail & Related papers (2021-11-28T19:27:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.