Overestimation of Syntactic Representationin Neural Language Models
- URL: http://arxiv.org/abs/2004.05067v1
- Date: Fri, 10 Apr 2020 15:13:03 GMT
- Title: Overestimation of Syntactic Representationin Neural Language Models
- Authors: Jordan Kodner, Nitish Gupta
- Abstract summary: One popular method for determining a model's ability to induce syntactic structure trains a model on strings generated according to a template then tests the model's ability to distinguish such strings from superficially similar ones with different syntax.
We illustrate a fundamental problem with this approach by reproducing positive results from a recent paper with two non-syntactic baseline language models.
- Score: 16.765097098482286
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the advent of powerful neural language models over the last few years,
research attention has increasingly focused on what aspects of language they
represent that make them so successful. Several testing methodologies have been
developed to probe models' syntactic representations. One popular method for
determining a model's ability to induce syntactic structure trains a model on
strings generated according to a template then tests the model's ability to
distinguish such strings from superficially similar ones with different syntax.
We illustrate a fundamental problem with this approach by reproducing positive
results from a recent paper with two non-syntactic baseline language models: an
n-gram model and an LSTM model trained on scrambled inputs.
Related papers
- Linguistically Grounded Analysis of Language Models using Shapley Head Values [2.914115079173979]
We investigate the processing of morphosyntactic phenomena by leveraging a recently proposed method for probing language models via Shapley Head Values (SHVs)
Using the English language BLiMP dataset, we test our approach on two widely used models, BERT and RoBERTa, and compare how linguistic constructions are handled.
Our results show that SHV-based attributions reveal distinct patterns across both models, providing insights into how language models organize and process linguistic information.
arXiv Detail & Related papers (2024-10-17T09:48:08Z) - Explicit Word Density Estimation for Language Modelling [24.8651840630298]
We propose a new family of language models based on NeuralODEs and the continuous analogue of Normalizing Flows.
In this work we propose a new family of language models based on NeuralODEs and the continuous analogue of Normalizing Flows and manage to improve on some of the baselines.
arXiv Detail & Related papers (2024-06-10T15:21:33Z) - Learning to Diversify Neural Text Generation via Degenerative Model [39.961572541752005]
We propose a new approach to prevent degeneration problems by training two models.
We first train a model that is designed to amplify undesirable patterns.
We then enhance the diversity of the second model by focusing on patterns that the first model fails to learn.
arXiv Detail & Related papers (2023-09-22T04:57:10Z) - Minimally-Supervised Speech Synthesis with Conditional Diffusion Model
and Language Model: A Comparative Study of Semantic Coding [57.42429912884543]
We propose Diff-LM-Speech, Tetra-Diff-Speech and Tri-Diff-Speech to solve high dimensionality and waveform distortion problems.
We also introduce a prompt encoder structure based on a variational autoencoder and a prosody bottleneck to improve prompt representation ability.
Experimental results show that our proposed methods outperform baseline methods.
arXiv Detail & Related papers (2023-07-28T11:20:23Z) - Language Model Cascades [72.18809575261498]
Repeated interactions at test-time with a single model, or the composition of multiple models together, further expands capabilities.
Cases with control flow and dynamic structure require techniques from probabilistic programming.
We formalize several existing techniques from this perspective, including scratchpads / chain of thought, verifiers, STaR, selection-inference, and tool use.
arXiv Detail & Related papers (2022-07-21T07:35:18Z) - Dependency-based Mixture Language Models [53.152011258252315]
We introduce the Dependency-based Mixture Language Models.
In detail, we first train neural language models with a novel dependency modeling objective.
We then formulate the next-token probability by mixing the previous dependency modeling probability distributions with self-attention.
arXiv Detail & Related papers (2022-03-19T06:28:30Z) - Explaining and Improving Model Behavior with k Nearest Neighbor
Representations [107.24850861390196]
We propose using k nearest neighbor representations to identify training examples responsible for a model's predictions.
We show that kNN representations are effective at uncovering learned spurious associations.
Our results indicate that the kNN approach makes the finetuned model more robust to adversarial inputs.
arXiv Detail & Related papers (2020-10-18T16:55:25Z) - Structural Supervision Improves Few-Shot Learning and Syntactic
Generalization in Neural Language Models [47.42249565529833]
Humans can learn structural properties about a word from minimal experience.
We assess the ability of modern neural language models to reproduce this behavior in English.
arXiv Detail & Related papers (2020-10-12T14:12:37Z) - Neural Baselines for Word Alignment [0.0]
We study and evaluate neural models for unsupervised word alignment for four language pairs.
We show that neural versions of the IBM-1 and hidden Markov models vastly outperform their discrete counterparts.
arXiv Detail & Related papers (2020-09-28T07:51:03Z) - Multi-timescale Representation Learning in LSTM Language Models [69.98840820213937]
Language models must capture statistical dependencies between words at timescales ranging from very short to very long.
We derived a theory for how the memory gating mechanism in long short-term memory language models can capture power law decay.
Experiments showed that LSTM language models trained on natural English text learn to approximate this theoretical distribution.
arXiv Detail & Related papers (2020-09-27T02:13:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.