Beyond Distributional Hypothesis: Let Language Models Learn Meaning-Text
Correspondence
- URL: http://arxiv.org/abs/2205.03815v1
- Date: Sun, 8 May 2022 08:37:36 GMT
- Title: Beyond Distributional Hypothesis: Let Language Models Learn Meaning-Text
Correspondence
- Authors: Myeongjun Jang, Frank Mtumbuka, Thomas Lukasiewicz
- Abstract summary: We show that large-size pre-trained language models (PLMs) do not satisfy the logical negation property (LNP)
We propose a novel intermediate training task, names meaning-matching, designed to directly learn a meaning-text correspondence.
We find that the task enables PLMs to learn lexical semantic information.
- Score: 45.9949173746044
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The logical negation property (LNP), which implies generating different
predictions for semantically opposite inputs, is an important property that a
trustworthy language model must satisfy. However, much recent evidence shows
that large-size pre-trained language models (PLMs) do not satisfy this
property. In this paper, we perform experiments using probing tasks to assess
PLM's LNP understanding. Unlike previous studies that only examined negation
expressions, we expand the boundary of the investigation to lexical semantics.
Through experiments, we observe that PLMs violate the LNP frequently. To
alleviate the issue, we propose a novel intermediate training task, names
meaning-matching, designed to directly learn a meaning-text correspondence,
instead of relying on the distributional hypothesis. Through multiple
experiments, we find that the task enables PLMs to learn lexical semantic
information. Also, through fine-tuning experiments on 7 GLUE tasks, we confirm
that it is a safe intermediate task that guarantees a similar or better
performance of downstream tasks. Finally, we observe that our proposed approach
outperforms our previous counterparts despite its time and resource efficiency.
Related papers
- Aggregation Artifacts in Subjective Tasks Collapse Large Language Models' Posteriors [74.04775677110179]
In-context Learning (ICL) has become the primary method for performing natural language tasks with Large Language Models (LLMs)
In this work, we examine whether this is the result of the aggregation used in corresponding datasets, where trying to combine low-agreement, disparate annotations might lead to annotation artifacts that create detrimental noise in the prompt.
Our results indicate that aggregation is a confounding factor in the modeling of subjective tasks, and advocate focusing on modeling individuals instead.
arXiv Detail & Related papers (2024-10-17T17:16:00Z) - Explicit Inductive Inference using Large Language Models [13.767536248988268]
Large Language Models (LLMs) are reported to hold undesirable attestation bias on inference tasks.
We propose a pipeline that exploits this bias to do explicit inductive inference.
arXiv Detail & Related papers (2024-08-26T17:58:17Z) - Latent Causal Probing: A Formal Perspective on Probing with Causal Models of Data [3.376269351435396]
We develop a formal perspective on probing using structural causal models (SCM)
We extend a recent study of LMs in the context of a synthetic grid-world navigation task.
Our techniques provide robust empirical evidence for the ability of LMs to induce the latent concepts underlying text.
arXiv Detail & Related papers (2024-07-18T17:59:27Z) - What Do Language Models Learn in Context? The Structured Task Hypothesis [89.65045443150889]
Large language models (LLMs) learn a novel task from in-context examples presented in a demonstration, termed in-context learning (ICL)
One popular hypothesis explains ICL by task selection.
Another popular hypothesis is that ICL is a form of meta-learning, i.e., the models learn a learning algorithm at pre-training time and apply it to the demonstration.
arXiv Detail & Related papers (2024-06-06T16:15:34Z) - Hypothesis Search: Inductive Reasoning with Language Models [39.03846394586811]
Recent work evaluates large language models on inductive reasoning tasks by directly prompting them yielding "in context learning"
This works well for straightforward inductive tasks but performs poorly on complex tasks such as the Abstraction and Reasoning Corpus (ARC)
In this work, we propose to improve the inductive reasoning ability of LLMs by generating explicit hypotheses at multiple levels of abstraction.
arXiv Detail & Related papers (2023-09-11T17:56:57Z) - Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds [59.71218039095155]
We evaluate language understanding capacities on simple inference tasks that most humans find trivial.
We target (i) grammatically-specified entailments, (ii) premises with evidential adverbs of uncertainty, and (iii) monotonicity entailments.
The models exhibit moderate to low performance on these evaluation sets.
arXiv Detail & Related papers (2023-05-24T06:41:09Z) - Masked Language Modeling and the Distributional Hypothesis: Order Word
Matters Pre-training for Little [74.49773960145681]
A possible explanation for the impressive performance of masked language model (MLM)-training is that such models have learned to represent the syntactic structures prevalent in NLP pipelines.
In this paper, we propose a different explanation: pre-trains succeed on downstream tasks almost entirely due to their ability to model higher-order word co-occurrence statistics.
Our results show that purely distributional information largely explains the success of pre-training, and underscore the importance of curating challenging evaluation datasets that require deeper linguistic knowledge.
arXiv Detail & Related papers (2021-04-14T06:30:36Z) - Is Supervised Syntactic Parsing Beneficial for Language Understanding?
An Empirical Investigation [71.70562795158625]
Traditional NLP has long held (supervised) syntactic parsing necessary for successful higher-level semantic language understanding (LU)
Recent advent of end-to-end neural models, self-supervised via language modeling (LM), and their success on a wide range of LU tasks, questions this belief.
We empirically investigate the usefulness of supervised parsing for semantic LU in the context of LM-pretrained transformer networks.
arXiv Detail & Related papers (2020-08-15T21:03:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.