Structural Supervision Improves Few-Shot Learning and Syntactic
Generalization in Neural Language Models
- URL: http://arxiv.org/abs/2010.05725v1
- Date: Mon, 12 Oct 2020 14:12:37 GMT
- Title: Structural Supervision Improves Few-Shot Learning and Syntactic
Generalization in Neural Language Models
- Authors: Ethan Wilcox, Peng Qian, Richard Futrell, Ryosuke Kohita, Roger Levy
and Miguel Ballesteros
- Abstract summary: Humans can learn structural properties about a word from minimal experience.
We assess the ability of modern neural language models to reproduce this behavior in English.
- Score: 47.42249565529833
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Humans can learn structural properties about a word from minimal experience,
and deploy their learned syntactic representations uniformly in different
grammatical contexts. We assess the ability of modern neural language models to
reproduce this behavior in English and evaluate the effect of structural
supervision on learning outcomes. First, we assess few-shot learning
capabilities by developing controlled experiments that probe models' syntactic
nominal number and verbal argument structure generalizations for tokens seen as
few as two times during training. Second, we assess invariance properties of
learned representation: the ability of a model to transfer syntactic
generalizations from a base context (e.g., a simple declarative active-voice
sentence) to a transformed context (e.g., an interrogative sentence). We test
four models trained on the same dataset: an n-gram baseline, an LSTM, and two
LSTM-variants trained with explicit structural supervision (Dyer et al.,2016;
Charniak et al., 2016). We find that in most cases, the neural models are able
to induce the proper syntactic generalizations after minimal exposure, often
from just two examples during training, and that the two structurally
supervised models generalize more accurately than the LSTM model. All neural
models are able to leverage information learned in base contexts to drive
expectations in transformed contexts, indicating that they have learned some
invariance properties of syntax.
Related papers
- Toward Understanding In-context vs. In-weight Learning [50.24035812301655]
We identify simplified distributional properties that give rise to the emergence and disappearance of in-context learning.
We then extend the study to a full large language model, showing how fine-tuning on various collections of natural language prompts can elicit similar in-context and in-weight learning behaviour.
arXiv Detail & Related papers (2024-10-30T14:09:00Z) - Explaining Text Similarity in Transformer Models [52.571158418102584]
Recent advances in explainable AI have made it possible to mitigate limitations by leveraging improved explanations for Transformers.
We use BiLRP, an extension developed for computing second-order explanations in bilinear similarity models, to investigate which feature interactions drive similarity in NLP models.
Our findings contribute to a deeper understanding of different semantic similarity tasks and models, highlighting how novel explainable AI methods enable in-depth analyses and corpus-level insights.
arXiv Detail & Related papers (2024-05-10T17:11:31Z) - In-Context Language Learning: Architectures and Algorithms [73.93205821154605]
We study ICL through the lens of a new family of model problems we term in context language learning (ICLL)
We evaluate a diverse set of neural sequence models on regular ICLL tasks.
arXiv Detail & Related papers (2024-01-23T18:59:21Z) - SLOG: A Structural Generalization Benchmark for Semantic Parsing [68.19511282584304]
The goal of compositional generalization benchmarks is to evaluate how well models generalize to new complex linguistic expressions.
Existing benchmarks often focus on lexical generalization, the interpretation of novel lexical items in syntactic structures familiar from training, are often underrepresented.
We introduce SLOG, a semantic parsing dataset that extends COGS with 17 structural generalization cases.
arXiv Detail & Related papers (2023-10-23T15:39:09Z) - Learning Disentangled Representations for Natural Language Definitions [0.0]
We argue that recurrent syntactic and semantic regularities in textual data can be used to provide the models with both structural biases and generative factors.
We leverage the semantic structures present in a representative and semantically dense category of sentence types, definitional sentences, for training a Variational Autoencoder to learn disentangled representations.
arXiv Detail & Related papers (2022-09-22T14:31:55Z) - Dependency-based Mixture Language Models [53.152011258252315]
We introduce the Dependency-based Mixture Language Models.
In detail, we first train neural language models with a novel dependency modeling objective.
We then formulate the next-token probability by mixing the previous dependency modeling probability distributions with self-attention.
arXiv Detail & Related papers (2022-03-19T06:28:30Z) - Coloring the Blank Slate: Pre-training Imparts a Hierarchical Inductive
Bias to Sequence-to-sequence Models [23.21767225871304]
Sequence-to-sequence (seq2seq) models often fail to generalize in a hierarchy-sensitive manner when performing syntactic transformations.
We find that pre-trained seq2seq models generalize hierarchically when performing syntactic transformations, whereas models trained from scratch on syntactic transformations do not.
arXiv Detail & Related papers (2022-03-17T15:46:53Z) - Syntactic Persistence in Language Models: Priming as a Window into
Abstract Language Representations [0.38498574327875945]
We investigate the extent to which modern, neural language models are susceptible to syntactic priming.
We introduce a novel metric and release Prime-LM, a large corpus where we control for various linguistic factors which interact with priming strength.
We report surprisingly strong priming effects when priming with multiple sentences, each with different words and meaning but with identical syntactic structure.
arXiv Detail & Related papers (2021-09-30T10:38:38Z) - Do Neural Models Learn Systematicity of Monotonicity Inference in
Natural Language? [41.649440404203595]
We introduce a method for evaluating whether neural models can learn systematicity of monotonicity inference in natural language.
We consider four aspects of monotonicity inferences and test whether the models can systematically interpret lexical and logical phenomena on different training/test splits.
arXiv Detail & Related papers (2020-04-30T14:48:39Z) - Overestimation of Syntactic Representationin Neural Language Models [16.765097098482286]
One popular method for determining a model's ability to induce syntactic structure trains a model on strings generated according to a template then tests the model's ability to distinguish such strings from superficially similar ones with different syntax.
We illustrate a fundamental problem with this approach by reproducing positive results from a recent paper with two non-syntactic baseline language models.
arXiv Detail & Related papers (2020-04-10T15:13:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.