Is Incoherence Surprising? Targeted Evaluation of Coherence Prediction
from Language Models
- URL: http://arxiv.org/abs/2105.03495v1
- Date: Fri, 7 May 2021 20:28:33 GMT
- Title: Is Incoherence Surprising? Targeted Evaluation of Coherence Prediction
from Language Models
- Authors: Anne Beyer and Sharid Lo\'aiciga and David Schlangen
- Abstract summary: We design an extendable set of test suites addressing different aspects of discourse and dialogue coherence.
Unlike most previous coherence evaluation studies, we address specific linguistic devices beyond sentence order perturbations.
We show that this paradigm is equally suited to evaluate linguistic qualities that contribute to the notion of coherence.
- Score: 7.5413579967970605
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Coherent discourse is distinguished from a mere collection of utterances by
the satisfaction of a diverse set of constraints, for example choice of
expression, logical relation between denoted events, and implicit compatibility
with world-knowledge. Do neural language models encode such constraints? We
design an extendable set of test suites addressing different aspects of
discourse and dialogue coherence. Unlike most previous coherence evaluation
studies, we address specific linguistic devices beyond sentence order
perturbations, allowing for a more fine-grained analysis of what constitutes
coherence and what neural models trained on a language modelling objective do
encode. Extending the targeted evaluation paradigm for neural language models
(Marvin and Linzen, 2018) to phenomena beyond syntax, we show that this
paradigm is equally suited to evaluate linguistic qualities that contribute to
the notion of coherence.
Related papers
- Learning Phonotactics from Linguistic Informants [54.086544221761486]
Our model iteratively selects or synthesizes a data-point according to one of a range of information-theoretic policies.
We find that the information-theoretic policies that our model uses to select items to query the informant achieve sample efficiency comparable to, or greater than, fully supervised approaches.
arXiv Detail & Related papers (2024-05-08T00:18:56Z) - Holmes: A Benchmark to Assess the Linguistic Competence of Language Models [59.627729608055006]
We introduce Holmes, a new benchmark designed to assess language models (LMs) linguistic competence.
We use computation-based probing to examine LMs' internal representations regarding distinct linguistic phenomena.
As a result, we meet recent calls to disentangle LMs' linguistic competence from other cognitive abilities.
arXiv Detail & Related papers (2024-04-29T17:58:36Z) - Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language
Modelling [70.23876429382969]
We propose a benchmark that can evaluate intra-sentence discourse properties across a diverse set of NLP tasks.
Disco-Bench consists of 9 document-level testsets in the literature domain, which contain rich discourse phenomena.
For linguistic analysis, we also design a diagnostic test suite that can examine whether the target models learn discourse knowledge.
arXiv Detail & Related papers (2023-07-16T15:18:25Z) - Constructing Word-Context-Coupled Space Aligned with Associative
Knowledge Relations for Interpretable Language Modeling [0.0]
The black-box structure of the deep neural network in pre-trained language models seriously limits the interpretability of the language modeling process.
A Word-Context-Coupled Space (W2CSpace) is proposed by introducing the alignment processing between uninterpretable neural representation and interpretable statistical logic.
Our language model can achieve better performance and highly credible interpretable ability compared to related state-of-the-art methods.
arXiv Detail & Related papers (2023-05-19T09:26:02Z) - A Latent-Variable Model for Intrinsic Probing [93.62808331764072]
We propose a novel latent-variable formulation for constructing intrinsic probes.
We find empirical evidence that pre-trained representations develop a cross-lingually entangled notion of morphosyntax.
arXiv Detail & Related papers (2022-01-20T15:01:12Z) - Causal Analysis of Syntactic Agreement Mechanisms in Neural Language
Models [40.83377935276978]
This study applies causal mediation analysis to pre-trained neural language models.
We investigate the magnitude of models' preferences for grammatical inflections.
We observe two distinct mechanisms for producing subject-verb agreement depending on the syntactic structure.
arXiv Detail & Related papers (2021-06-10T23:50:51Z) - Discrete representations in neural models of spoken language [56.29049879393466]
We compare the merits of four commonly used metrics in the context of weakly supervised models of spoken language.
We find that the different evaluation metrics can give inconsistent results.
arXiv Detail & Related papers (2021-05-12T11:02:02Z) - Does injecting linguistic structure into language models lead to better
alignment with brain recordings? [13.880819301385854]
We evaluate whether language models align better with brain recordings if their attention is biased by annotations from syntactic or semantic formalisms.
Our proposed approach enables the evaluation of more targeted hypotheses about the composition of meaning in the brain.
arXiv Detail & Related papers (2021-01-29T14:42:02Z) - Evaluating Models of Robust Word Recognition with Serial Reproduction [8.17947290421835]
We compare several broad-coverage probabilistic generative language models in their ability to capture human linguistic expectations.
We find that those models that make use of abstract representations of preceding linguistic context best predict the changes made by people in the course of serial reproduction.
arXiv Detail & Related papers (2021-01-24T20:16:12Z) - Analyzing Neural Discourse Coherence Models [17.894463722947542]
We investigate how well current models of coherence can capture aspects of text implicated in discourse organisation.
We devise two datasets of various linguistic alterations that undermine coherence and test model sensitivity to changes in syntax and semantics.
arXiv Detail & Related papers (2020-11-12T10:44:41Z) - Mechanisms for Handling Nested Dependencies in Neural-Network Language
Models and Humans [75.15855405318855]
We studied whether a modern artificial neural network trained with "deep learning" methods mimics a central aspect of human sentence processing.
Although the network was solely trained to predict the next word in a large corpus, analysis showed the emergence of specialized units that successfully handled local and long-distance syntactic agreement.
We tested the model's predictions in a behavioral experiment where humans detected violations in number agreement in sentences with systematic variations in the singular/plural status of multiple nouns.
arXiv Detail & Related papers (2020-06-19T12:00:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.