LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning
- URL: http://arxiv.org/abs/2101.06223v1
- Date: Fri, 15 Jan 2021 17:15:24 GMT
- Title: LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning
- Authors: Yuhuai Wu, Markus Rabe, Wenda Li, Jimmy Ba, Roger Grosse, Christian
Szegedy
- Abstract summary: We replace architecture engineering by encoding inductive bias in datasets.
Inspired by Peirce's view that deduction, induction, and abduction form an irreducible set of reasoning primitives, we design three synthetic tasks that are intended to require the model to have these three abilities.
Models trained with LIME significantly outperform vanilla transformers on three very different large mathematical reasoning benchmarks.
- Score: 30.610670366488943
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While designing inductive bias in neural architectures has been widely
studied, we hypothesize that transformer networks are flexible enough to learn
inductive bias from suitable generic tasks. Here, we replace architecture
engineering by encoding inductive bias in the form of datasets. Inspired by
Peirce's view that deduction, induction, and abduction form an irreducible set
of reasoning primitives, we design three synthetic tasks that are intended to
require the model to have these three abilities. We specifically design these
synthetic tasks in a way that they are devoid of mathematical knowledge to
ensure that only the fundamental reasoning biases can be learned from these
tasks. This defines a new pre-training methodology called "LIME" (Learning
Inductive bias for Mathematical rEasoning). Models trained with LIME
significantly outperform vanilla transformers on three very different large
mathematical reasoning benchmarks. Unlike dominating the computation cost as
traditional pre-training approaches, LIME requires only a small fraction of the
computation cost of the typical downstream task.
Related papers
- Provably Transformers Harness Multi-Concept Word Semantics for Efficient In-Context Learning [53.685764040547625]
Transformer-based large language models (LLMs) have displayed remarkable creative prowess and emergence capabilities.
This work provides a fine mathematical analysis to show how transformers leverage the multi-concept semantics of words to enable powerful ICL and excellent out-of-distribution ICL abilities.
arXiv Detail & Related papers (2024-11-04T15:54:32Z) - On the Inductive Bias of Stacking Towards Improving Reasoning [50.225873619537765]
We propose a variant of gradual stacking called MIDAS that can speed up language model training by up to 40%.
MIDAS is not only training-efficient but surprisingly also has an inductive bias towards improving downstream tasks.
We conjecture the underlying reason for this inductive bias by exploring the connection of stacking to looped models.
arXiv Detail & Related papers (2024-09-27T17:58:21Z) - Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis [63.66763657191476]
We show that efficient numerical training and inference algorithms as low-rank computation have impressive performance for learning Transformer-based adaption.
We analyze how magnitude-based models affect generalization while improving adaption.
We conclude that proper magnitude-based has a slight on the testing performance.
arXiv Detail & Related papers (2024-06-24T23:00:58Z) - Towards Exact Computation of Inductive Bias [8.988109761916379]
We propose a novel method for efficiently computing the inductive bias required for generalization on a task.
We show that higher dimensional tasks require greater inductive bias.
Our proposed inductive bias metric provides an information-theoretic interpretation of the benefits of specific model architectures.
arXiv Detail & Related papers (2024-06-22T21:14:24Z) - Tripod: Three Complementary Inductive Biases for Disentangled Representation Learning [52.70210390424605]
In this work, we consider endowing a neural network autoencoder with three select inductive biases from the literature.
In practice, however, naively combining existing techniques instantiating these inductive biases fails to yield significant benefits.
We propose adaptations to the three techniques that simplify the learning problem, equip key regularization terms with stabilizing invariances, and quash degenerate incentives.
The resulting model, Tripod, achieves state-of-the-art results on a suite of four image disentanglement benchmarks.
arXiv Detail & Related papers (2024-04-16T04:52:41Z) - Instilling Inductive Biases with Subnetworks [19.444844580405594]
Subtask Induction instills inductive biases towards solutions utilizing a subtask.
We show that Subtask Induction significantly reduces the amount of training data required for a model to adopt a specific, generalizable solution.
arXiv Detail & Related papers (2023-10-17T00:12:19Z) - SIP: Injecting a Structural Inductive Bias into a Seq2Seq Model by Simulation [75.14793516745374]
We show how a structural inductive bias can be efficiently injected into a seq2seq model by pre-training it to simulate structural transformations on synthetic data.
Our experiments show that our method imparts the desired inductive bias, resulting in better few-shot learning for FST-like tasks.
arXiv Detail & Related papers (2023-10-01T21:19:12Z) - Training a First-Order Theorem Prover from Synthetic Data [50.23600875138756]
A major challenge in applying machine learning to automated theorem proving is the scarcity of training data.
We propose an approach that relies on training purely with synthetically generated theorems, without any human data aside from axioms.
Our neural prover outperforms the state-of-the-art E-prover on this synthetic data in both time and search steps.
arXiv Detail & Related papers (2021-03-05T17:01:34Z) - What they do when in doubt: a study of inductive biases in seq2seq
learners [22.678902168856624]
We study how popular seq2seq learners generalize in tasks that have high ambiguity in the training data.
We connect to Solomonoff's theory of induction and propose to use description length as a principled and sensitive measure of inductive biases.
arXiv Detail & Related papers (2020-06-26T12:43:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.