Pragmatic competence of pre-trained language models through the lens of
discourse connectives
- URL: http://arxiv.org/abs/2109.12951v1
- Date: Mon, 27 Sep 2021 11:04:41 GMT
- Title: Pragmatic competence of pre-trained language models through the lens of
discourse connectives
- Authors: Lalchand Pandia, Yan Cong and Allyson Ettinger
- Abstract summary: As pre-trained language models (LMs) continue to dominate NLP, it is increasingly important that we understand the depth of language capabilities in these models.
We focus on testing models' ability to use pragmatic cues to predict discourse connectives.
We find that although models predict connectives reasonably well in the context of naturally-occurring data, when we control contexts to isolate high-level pragmatic cues, model sensitivity is much lower.
- Score: 4.917317902787791
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As pre-trained language models (LMs) continue to dominate NLP, it is
increasingly important that we understand the depth of language capabilities in
these models. In this paper, we target pre-trained LMs' competence in
pragmatics, with a focus on pragmatics relating to discourse connectives. We
formulate cloze-style tests using a combination of naturally-occurring data and
controlled inputs drawn from psycholinguistics. We focus on testing models'
ability to use pragmatic cues to predict discourse connectives, models' ability
to understand implicatures relating to connectives, and the extent to which
models show humanlike preferences regarding temporal dynamics of connectives.
We find that although models predict connectives reasonably well in the context
of naturally-occurring data, when we control contexts to isolate high-level
pragmatic cues, model sensitivity is much lower. Models also do not show
substantial humanlike temporal preferences. Overall, the findings suggest that
at present, dominant pre-training paradigms do not result in substantial
pragmatic competence in our models.
Related papers
- Verbalized Probabilistic Graphical Modeling with Large Language Models [8.961720262676195]
This work introduces a novel Bayesian prompting approach that facilitates training-free Bayesian inference with large language models.
Our results indicate that the model effectively enhances confidence elicitation and text generation quality, demonstrating its potential to improve AI language understanding systems.
arXiv Detail & Related papers (2024-06-08T16:35:31Z) - Regularized Conventions: Equilibrium Computation as a Model of Pragmatic
Reasoning [72.21876989058858]
We present a model of pragmatic language understanding, where utterances are produced and understood by searching for regularized equilibria of signaling games.
In this model speakers and listeners search for contextually appropriate utterance--meaning mappings that are both close to game-theoretically optimal conventions and close to a shared, ''default'' semantics.
arXiv Detail & Related papers (2023-11-16T09:42:36Z) - Interpreting Pretrained Language Models via Concept Bottlenecks [55.47515772358389]
Pretrained language models (PLMs) have made significant strides in various natural language processing tasks.
The lack of interpretability due to their black-box'' nature poses challenges for responsible implementation.
We propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans.
arXiv Detail & Related papers (2023-11-08T20:41:18Z) - Improving Language Models Meaning Understanding and Consistency by
Learning Conceptual Roles from Dictionary [65.268245109828]
Non-human-like behaviour of contemporary pre-trained language models (PLMs) is a leading cause undermining their trustworthiness.
A striking phenomenon is the generation of inconsistent predictions, which produces contradictory results.
We propose a practical approach that alleviates the inconsistent behaviour issue by improving PLM awareness.
arXiv Detail & Related papers (2023-10-24T06:15:15Z) - Commonsense Knowledge Transfer for Pre-trained Language Models [83.01121484432801]
We introduce commonsense knowledge transfer, a framework to transfer the commonsense knowledge stored in a neural commonsense knowledge model to a general-purpose pre-trained language model.
It first exploits general texts to form queries for extracting commonsense knowledge from the neural commonsense knowledge model.
It then refines the language model with two self-supervised objectives: commonsense mask infilling and commonsense relation prediction.
arXiv Detail & Related papers (2023-06-04T15:44:51Z) - On Robustness of Prompt-based Semantic Parsing with Large Pre-trained
Language Model: An Empirical Study on Codex [48.588772371355816]
This paper presents the first empirical study on the adversarial robustness of a large prompt-based language model of code, codex.
Our results demonstrate that the state-of-the-art (SOTA) code-language models are vulnerable to carefully crafted adversarial examples.
arXiv Detail & Related papers (2023-01-30T13:21:00Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - Contextualization and Generalization in Entity and Relation Extraction [0.0]
We study the behaviour of state-of-the-art models regarding generalization to facts unseen during training.
Traditional benchmarks present important lexical overlap between mentions and relations used for training and evaluating models.
We propose empirical studies to separate performance based on mention and relation overlap with the training set.
arXiv Detail & Related papers (2022-06-15T14:16:42Z) - A Survey of Knowledge Enhanced Pre-trained Models [28.160826399552462]
We refer to pre-trained language models with knowledge injection as knowledge-enhanced pre-trained language models (KEPLMs)
These models demonstrate deep understanding and logical reasoning and introduce interpretability.
arXiv Detail & Related papers (2021-10-01T08:51:58Z) - Labeling Explicit Discourse Relations using Pre-trained Language Models [0.0]
State-of-the-art models achieve slightly above 45% of F-score by using hand-crafted features.
We find that the pre-trained language models, when finetuned, are powerful enough to replace the linguistic features.
This is the first time when a model outperforms the knowledge intensive models without employing any linguistic features.
arXiv Detail & Related papers (2020-06-21T17:18:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.