Accounting for Agreement Phenomena in Sentence Comprehension with
Transformer Language Models: Effects of Similarity-based Interference on
Surprisal and Attention
- URL: http://arxiv.org/abs/2104.12874v1
- Date: Mon, 26 Apr 2021 20:46:54 GMT
- Title: Accounting for Agreement Phenomena in Sentence Comprehension with
Transformer Language Models: Effects of Similarity-based Interference on
Surprisal and Attention
- Authors: Soo Hyun Ryu and Richard L. Lewis
- Abstract summary: We advance an explanation of similarity-based interference effects in subject-verb and reflexive pronoun agreement processing.
We show that surprisal of the verb or reflexive pronoun predicts facilitatory interference effects in ungrammatical sentences.
- Score: 4.103438743479001
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We advance a novel explanation of similarity-based interference effects in
subject-verb and reflexive pronoun agreement processing, grounded in surprisal
values computed from a pretrained large-scale Transformer model, GPT-2.
Specifically, we show that surprisal of the verb or reflexive pronoun predicts
facilitatory interference effects in ungrammatical sentences, where a
distractor noun that matches in number with the verb or pronoun leads to faster
reading times, despite the distractor not participating in the agreement
relation. We review the human empirical evidence for such effects, including
recent meta-analyses and large-scale studies. We also show that attention
patterns (indexed by entropy and other measures) in the Transformer show
patterns of diffuse attention in the presence of similar distractors,
consistent with cue-based retrieval models of parsing. But in contrast to these
models, the attentional cues and memory representations are learned entirely
from the simple self-supervised task of predicting the next word.
Related papers
- Counterfactual Generation from Language Models [64.55296662926919]
We show that counterfactual reasoning is conceptually distinct from interventions.
We propose a framework for generating true string counterfactuals.
Our experiments demonstrate that the approach produces meaningful counterfactuals.
arXiv Detail & Related papers (2024-11-11T17:57:30Z) - Explaining Text Similarity in Transformer Models [52.571158418102584]
Recent advances in explainable AI have made it possible to mitigate limitations by leveraging improved explanations for Transformers.
We use BiLRP, an extension developed for computing second-order explanations in bilinear similarity models, to investigate which feature interactions drive similarity in NLP models.
Our findings contribute to a deeper understanding of different semantic similarity tasks and models, highlighting how novel explainable AI methods enable in-depth analyses and corpus-level insights.
arXiv Detail & Related papers (2024-05-10T17:11:31Z) - When to generate hedges in peer-tutoring interactions [1.0466434989449724]
The study uses a naturalistic face-to-face dataset annotated for natural language turns, conversational strategies, tutoring strategies, and nonverbal behaviours.
Results show that embedding layers, that capture the semantic information of the previous turns, significantly improves the model's performance.
We discover that the eye gaze of both the tutor and the tutee has a significant impact on hedge prediction.
arXiv Detail & Related papers (2023-07-28T14:29:19Z) - An Information-Theoretic Analysis of Self-supervised Discrete
Representations of Speech [17.07957283733822]
We develop an information-theoretic framework whereby we represent each phonetic category as a distribution over discrete units.
Our study demonstrates that the entropy of phonetic distributions reflects the variability of the underlying speech sounds.
While our study confirms the lack of direct, one-to-one correspondence, we find an intriguing, indirect relationship between phonetic categories and discrete units.
arXiv Detail & Related papers (2023-06-04T16:52:11Z) - Shapley Head Pruning: Identifying and Removing Interference in
Multilingual Transformers [54.4919139401528]
We show that it is possible to reduce interference by identifying and pruning language-specific parameters.
We show that removing identified attention heads from a fixed model improves performance for a target language on both sentence classification and structural prediction.
arXiv Detail & Related papers (2022-10-11T18:11:37Z) - Non-Linguistic Supervision for Contrastive Learning of Sentence
Embeddings [14.244787327283335]
We find the performance of Transformer models as sentence encoders can be improved by training with multi-modal multi-task losses.
The reliance of our framework on unpaired non-linguistic data makes it language-agnostic, enabling it to be widely applicable beyond English NLP.
arXiv Detail & Related papers (2022-09-20T03:01:45Z) - Towards Disentangled Speech Representations [65.7834494783044]
We construct a representation learning task based on joint modeling of ASR and TTS.
We seek to learn a representation of audio that disentangles that part of the speech signal that is relevant to transcription from that part which is not.
We show that enforcing these properties during training improves WER by 24.5% relative on average for our joint modeling task.
arXiv Detail & Related papers (2022-08-28T10:03:55Z) - Did the Cat Drink the Coffee? Challenging Transformers with Generalized
Event Knowledge [59.22170796793179]
Transformers Language Models (TLMs) were tested on a benchmark for the textitdynamic estimation of thematic fit
Our results show that TLMs can reach performances that are comparable to those achieved by SDM.
However, additional analysis consistently suggests that TLMs do not capture important aspects of event knowledge.
arXiv Detail & Related papers (2021-07-22T20:52:26Z) - Probing for Bridging Inference in Transformer Language Models [15.216901057561428]
We first investigate individual attention heads in BERT and observe that attention heads at higher layers prominently focus on bridging relations.
We consider language models as a whole in our approach where bridging anaphora resolution is formulated as a masked token prediction task.
Our formulation produces optimistic results without any fine-tuning, which indicates that pre-trained language models substantially capture bridging inference.
arXiv Detail & Related papers (2021-04-19T15:42:24Z) - Prototypical Representation Learning for Relation Extraction [56.501332067073065]
This paper aims to learn predictive, interpretable, and robust relation representations from distantly-labeled data.
We learn prototypes for each relation from contextual information to best explore the intrinsic semantics of relations.
Results on several relation learning tasks show that our model significantly outperforms the previous state-of-the-art relational models.
arXiv Detail & Related papers (2021-03-22T08:11:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.