An Analysis of the Utility of Explicit Negative Examples to Improve the
Syntactic Abilities of Neural Language Models
- URL: http://arxiv.org/abs/2004.02451v3
- Date: Sat, 6 Jun 2020 07:41:21 GMT
- Title: An Analysis of the Utility of Explicit Negative Examples to Improve the
Syntactic Abilities of Neural Language Models
- Authors: Hiroshi Noji, Hiroya Takamura
- Abstract summary: We explore the utilities of explicit negative examples in training neural language models.
We find that even with our direct learning signals the models still suffer from resolving agreement across an object-relative clause.
- Score: 32.183409062294466
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We explore the utilities of explicit negative examples in training neural
language models. Negative examples here are incorrect words in a sentence, such
as "barks" in "*The dogs barks". Neural language models are commonly trained
only on positive examples, a set of sentences in the training data, but recent
studies suggest that the models trained in this way are not capable of robustly
handling complex syntactic constructions, such as long-distance agreement. In
this paper, using English data, we first demonstrate that appropriately using
negative examples about particular constructions (e.g., subject-verb agreement)
will boost the model's robustness on them, with a negligible loss of
perplexity. The key to our success is an additional margin loss between the
log-likelihoods of a correct word and an incorrect word. We then provide a
detailed analysis of the trained models. One of our findings is the difficulty
of object-relative clauses for RNNs. We find that even with our direct learning
signals the models still suffer from resolving agreement across an
object-relative clause. Augmentation of training sentences involving the
constructions somewhat helps, but the accuracy still does not reach the level
of subject-relative clauses. Although not directly cognitively appealing, our
method can be a tool to analyze the true architectural limitation of neural
models on challenging linguistic constructions.
Related papers
- ALERT: Adapting Language Models to Reasoning Tasks [43.8679673685468]
ALERT is a benchmark and suite of analyses for assessing language models' reasoning ability.
ALERT provides a test bed to asses any language model on fine-grained reasoning skills.
We find that language models learn more reasoning skills during finetuning stage compared to pretraining state.
arXiv Detail & Related papers (2022-12-16T05:15:41Z) - Discovering Latent Knowledge in Language Models Without Supervision [72.95136739040676]
Existing techniques for training language models can be misaligned with the truth.
We propose directly finding latent knowledge inside the internal activations of a language model in a purely unsupervised way.
We show that despite using no supervision and no model outputs, our method can recover diverse knowledge represented in large language models.
arXiv Detail & Related papers (2022-12-07T18:17:56Z) - On The Ingredients of an Effective Zero-shot Semantic Parser [95.01623036661468]
We analyze zero-shot learning by paraphrasing training examples of canonical utterances and programs from a grammar.
We propose bridging these gaps using improved grammars, stronger paraphrasers, and efficient learning methods.
Our model achieves strong performance on two semantic parsing benchmarks (Scholar, Geo) with zero labeled data.
arXiv Detail & Related papers (2021-10-15T21:41:16Z) - Understanding by Understanding Not: Modeling Negation in Language Models [81.21351681735973]
Negation is a core construction in natural language.
We propose to augment the language modeling objective with an unlikelihood objective that is based on negated generic sentences.
We reduce the mean top1 error rate to 4% on the negated LAMA dataset.
arXiv Detail & Related papers (2021-05-07T21:58:35Z) - Detecting and Exorcising Statistical Demons from Language Models with
Anti-Models of Negative Data [13.392212395386933]
We find that within a model family, as the number of parameters, training epochs, and data set size increase, so does a model's ability to generalize to negative n-gram data.
We propose a form of inductive bias that attenuates such undesirable signals with negative data distributions automatically learned from positive data.
arXiv Detail & Related papers (2020-10-22T16:45:32Z) - Comparison of Interactive Knowledge Base Spelling Correction Models for
Low-Resource Languages [81.90356787324481]
Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict.
This work shows a comparison of a neural model and character language models with varying amounts on target language data.
Our usage scenario is interactive correction with nearly zero amounts of training examples, improving models as more data is collected.
arXiv Detail & Related papers (2020-10-20T17:31:07Z) - Explaining and Improving Model Behavior with k Nearest Neighbor
Representations [107.24850861390196]
We propose using k nearest neighbor representations to identify training examples responsible for a model's predictions.
We show that kNN representations are effective at uncovering learned spurious associations.
Our results indicate that the kNN approach makes the finetuned model more robust to adversarial inputs.
arXiv Detail & Related papers (2020-10-18T16:55:25Z) - Recurrent Neural Network Language Models Always Learn English-Like
Relative Clause Attachment [17.995905582226463]
We compare model performance in English and Spanish to show that non-linguistic biases in RNN LMs advantageously overlap with syntactic structure in English but not Spanish.
English models may appear to acquire human-like syntactic preferences, while models trained on Spanish fail to acquire comparable human-like preferences.
arXiv Detail & Related papers (2020-05-01T01:21:47Z) - Multi-Step Inference for Reasoning Over Paragraphs [95.91527524872832]
Complex reasoning over text requires understanding and chaining together free-form predicates and logical connectives.
We present a compositional model reminiscent of neural module networks that can perform chained logical reasoning.
arXiv Detail & Related papers (2020-04-06T21:12:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.