What they do when in doubt: a study of inductive biases in seq2seq
learners
- URL: http://arxiv.org/abs/2006.14953v2
- Date: Mon, 29 Mar 2021 09:43:36 GMT
- Title: What they do when in doubt: a study of inductive biases in seq2seq
learners
- Authors: Eugene Kharitonov and Rahma Chaabouni
- Abstract summary: We study how popular seq2seq learners generalize in tasks that have high ambiguity in the training data.
We connect to Solomonoff's theory of induction and propose to use description length as a principled and sensitive measure of inductive biases.
- Score: 22.678902168856624
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sequence-to-sequence (seq2seq) learners are widely used, but we still have
only limited knowledge about what inductive biases shape the way they
generalize. We address that by investigating how popular seq2seq learners
generalize in tasks that have high ambiguity in the training data. We use SCAN
and three new tasks to study learners' preferences for memorization,
arithmetic, hierarchical, and compositional reasoning. Further, we connect to
Solomonoff's theory of induction and propose to use description length as a
principled and sensitive measure of inductive biases.
In our experimental study, we find that LSTM-based learners can learn to
perform counting, addition, and multiplication by a constant from a single
training example. Furthermore, Transformer and LSTM-based learners show a bias
toward the hierarchical induction over the linear one, while CNN-based learners
prefer the opposite. On the SCAN dataset, we find that CNN-based, and, to a
lesser degree, Transformer- and LSTM-based learners have a preference for
compositional generalization over memorization. Finally, across all our
experiments, description length proved to be a sensitive measure of inductive
biases.
Related papers
- What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages [78.1866280652834]
Large language models (LM) are distributions over strings.
We investigate the learnability of regular LMs (RLMs) by RNN and Transformer LMs.
We find that the complexity of the RLM rank is strong and significant predictors of learnability for both RNNs and Transformers.
arXiv Detail & Related papers (2024-06-06T17:34:24Z) - Dynamically Modular and Sparse General Continual Learning [13.976220447055521]
We introduce dynamic modularity and sparsity (Dynamos) for rehearsal-based general continual learning.
We show that our method learns representations that are modular and specialized, while maintaining reusability by activating subsets of neurons with overlaps corresponding to the similarity of stimuli.
arXiv Detail & Related papers (2023-01-02T12:24:24Z) - Neural networks trained with SGD learn distributions of increasing
complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics.
We then exploit higher-order statistics only later during training.
We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z) - Measures of Information Reflect Memorization Patterns [53.71420125627608]
We show that the diversity in the activation patterns of different neurons is reflective of model generalization and memorization.
Importantly, we discover that information organization points to the two forms of memorization, even for neural activations computed on unlabelled in-distribution examples.
arXiv Detail & Related papers (2022-10-17T20:15:24Z) - Evading the Simplicity Bias: Training a Diverse Set of Models Discovers
Solutions with Superior OOD Generalization [93.8373619657239]
Neural networks trained with SGD were recently shown to rely preferentially on linearly-predictive features.
This simplicity bias can explain their lack of robustness out of distribution (OOD)
We demonstrate that the simplicity bias can be mitigated and OOD generalization improved.
arXiv Detail & Related papers (2021-05-12T12:12:24Z) - LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning [30.610670366488943]
We replace architecture engineering by encoding inductive bias in datasets.
Inspired by Peirce's view that deduction, induction, and abduction form an irreducible set of reasoning primitives, we design three synthetic tasks that are intended to require the model to have these three abilities.
Models trained with LIME significantly outperform vanilla transformers on three very different large mathematical reasoning benchmarks.
arXiv Detail & Related papers (2021-01-15T17:15:24Z) - Understanding Self-supervised Learning with Dual Deep Networks [74.92916579635336]
We propose a novel framework to understand contrastive self-supervised learning (SSL) methods that employ dual pairs of deep ReLU networks.
We prove that in each SGD update of SimCLR with various loss functions, the weights at each layer are updated by a emphcovariance operator.
To further study what role the covariance operator plays and which features are learned in such a process, we model data generation and augmentation processes through a emphhierarchical latent tree model (HLTM)
arXiv Detail & Related papers (2020-10-01T17:51:49Z) - Learning from Failure: Training Debiased Classifier from Biased
Classifier [76.52804102765931]
We show that neural networks learn to rely on spurious correlation only when it is "easier" to learn than the desired knowledge.
We propose a failure-based debiasing scheme by training a pair of neural networks simultaneously.
Our method significantly improves the training of the network against various types of biases in both synthetic and real-world datasets.
arXiv Detail & Related papers (2020-07-06T07:20:29Z) - Universal linguistic inductive biases via meta-learning [36.43388942327124]
It is unclear which inductive biases can explain observed patterns in language acquisition.
We introduce a framework for giving linguistic inductive biases to a neural network model.
We demonstrate this framework with a case study based on syllable structure.
arXiv Detail & Related papers (2020-06-29T19:15:10Z) - Rethink the Connections among Generalization, Memorization and the
Spectral Bias of DNNs [44.5823185453399]
We show that the monotonicity of the learning bias does not always hold.
Under the experimental setup of deep double descent, the high-frequency components of DNNs diminish in the late stage of training.
arXiv Detail & Related papers (2020-04-29T04:24:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.