Investigating representations of verb bias in neural language models
- URL: http://arxiv.org/abs/2010.02375v2
- Date: Thu, 15 Oct 2020 19:37:48 GMT
- Title: Investigating representations of verb bias in neural language models
- Authors: Robert D. Hawkins, Takateru Yamakoshi, Thomas L. Griffiths, Adele E.
Goldberg
- Abstract summary: We introduce DAIS, a benchmark dataset containing 50K human judgments for 5K distinct sentence pairs in the English dative alternation.
This dataset includes 200 unique verbs and systematically varies the definiteness and length of arguments.
We use this dataset, as well as an existing corpus of naturally occurring data, to evaluate how well recent neural language models capture human preferences.
- Score: 7.455546102930909
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Languages typically provide more than one grammatical construction to express
certain types of messages. A speaker's choice of construction is known to
depend on multiple factors, including the choice of main verb -- a phenomenon
known as \emph{verb bias}. Here we introduce DAIS, a large benchmark dataset
containing 50K human judgments for 5K distinct sentence pairs in the English
dative alternation. This dataset includes 200 unique verbs and systematically
varies the definiteness and length of arguments. We use this dataset, as well
as an existing corpus of naturally occurring data, to evaluate how well recent
neural language models capture human preferences. Results show that larger
models perform better than smaller models, and transformer architectures (e.g.
GPT-2) tend to out-perform recurrent architectures (e.g. LSTMs) even under
comparable parameter and training settings. Additional analyses of internal
feature representations suggest that transformers may better integrate specific
lexical information with grammatical constructions.
Related papers
- Explaining Datasets in Words: Statistical Models with Natural Language Parameters [66.69456696878842]
We introduce a family of statistical models -- including clustering, time series, and classification models -- parameterized by natural language predicates.
We apply our framework to a wide range of problems: taxonomizing user chat dialogues, characterizing how they evolve across time, finding categories where one language model is better than the other.
arXiv Detail & Related papers (2024-09-13T01:40:20Z) - Generative Spoken Language Model based on continuous word-sized audio
tokens [52.081868603603844]
We introduce a Generative Spoken Language Model based on word-size continuous-valued audio embeddings.
The resulting model is the first generative language model based on word-size continuous embeddings.
arXiv Detail & Related papers (2023-10-08T16:46:14Z) - How to Plant Trees in Language Models: Data and Architectural Effects on
the Emergence of Syntactic Inductive Biases [28.58785395946639]
We show that pre-training can teach language models to rely on hierarchical syntactic features when performing tasks after fine-tuning.
We focus on architectural features (depth, width, and number of parameters), as well as the genre and size of the pre-training corpus.
arXiv Detail & Related papers (2023-05-31T14:38:14Z) - How much pretraining data do language models need to learn syntax? [12.668478784932878]
Transformers-based pretrained language models achieve outstanding results in many well-known NLU benchmarks.
We study the impact of pretraining data size on the knowledge of the models using RoBERTa.
arXiv Detail & Related papers (2021-09-07T15:51:39Z) - Structural Guidance for Transformer Language Models [24.00537240110055]
We study whether structural guidance leads to more human-like systematic linguistic generalization in Transformer language models.
Experiment results suggest converging evidence that generative structural supervisions can induce more robust and humanlike linguistic generalization.
arXiv Detail & Related papers (2021-07-30T23:14:51Z) - What Context Features Can Transformer Language Models Use? [32.49689188570872]
We measure usable information by selectively ablating lexical and structural information in transformer language models trained on English Wikipedia.
In both mid- and long-range contexts, we find that several extremely destructive context manipulations remove less than 15% of the usable information.
arXiv Detail & Related papers (2021-06-15T18:38:57Z) - Sentiment analysis in tweets: an assessment study from classical to
modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information.
Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks.
This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z) - Comparison of Interactive Knowledge Base Spelling Correction Models for
Low-Resource Languages [81.90356787324481]
Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict.
This work shows a comparison of a neural model and character language models with varying amounts on target language data.
Our usage scenario is interactive correction with nearly zero amounts of training examples, improving models as more data is collected.
arXiv Detail & Related papers (2020-10-20T17:31:07Z) - Grounded Compositional Outputs for Adaptive Language Modeling [59.02706635250856]
A language model's vocabulary$-$typically selected before training and permanently fixed later$-$affects its size.
We propose a fully compositional output embedding layer for language models.
To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.
arXiv Detail & Related papers (2020-09-24T07:21:14Z) - Parameter Space Factorization for Zero-Shot Learning across Tasks and
Languages [112.65994041398481]
We propose a Bayesian generative model for the space of neural parameters.
We infer the posteriors over such latent variables based on data from seen task-language combinations.
Our model yields comparable or better results than state-of-the-art, zero-shot cross-lingual transfer methods.
arXiv Detail & Related papers (2020-01-30T16:58:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.