Modelling Commonsense Properties using Pre-Trained Bi-Encoders
- URL: http://arxiv.org/abs/2210.02771v1
- Date: Thu, 6 Oct 2022 09:17:34 GMT
- Title: Modelling Commonsense Properties using Pre-Trained Bi-Encoders
- Authors: Amit Gajbhiye, Luis Espinosa-Anke, Steven Schockaert
- Abstract summary: We study the possibility of fine-tuning language models to explicitly model concepts and their properties.
Our experimental results show that the resulting encoders allow us to predict commonsense properties with much higher accuracy than is possible.
- Score: 40.327695801431375
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Grasping the commonsense properties of everyday concepts is an important
prerequisite to language understanding. While contextualised language models
are reportedly capable of predicting such commonsense properties with
human-level accuracy, we argue that such results have been inflated because of
the high similarity between training and test concepts. This means that models
which capture concept similarity can perform well, even if they do not capture
any knowledge of the commonsense properties themselves. In settings where there
is no overlap between the properties that are considered during training and
testing, we find that the empirical performance of standard language models
drops dramatically. To address this, we study the possibility of fine-tuning
language models to explicitly model concepts and their properties. In
particular, we train separate concept and property encoders on two types of
readily available data: extracted hyponym-hypernym pairs and generic sentences.
Our experimental results show that the resulting encoders allow us to predict
commonsense properties with much higher accuracy than is possible by directly
fine-tuning language models. We also present experimental results for the
related task of unsupervised hypernym discovery.
Related papers
- CONTESTS: a Framework for Consistency Testing of Span Probabilities in Language Models [16.436592723426305]
It is unclear whether language models produce the same value for different ways of assigning joint probabilities to word spans.
Our work introduces a novel framework, ConTestS, involving statistical tests to assess score consistency across interchangeable completion and conditioning orders.
arXiv Detail & Related papers (2024-09-30T06:24:43Z) - Commonsense Knowledge Transfer for Pre-trained Language Models [83.01121484432801]
We introduce commonsense knowledge transfer, a framework to transfer the commonsense knowledge stored in a neural commonsense knowledge model to a general-purpose pre-trained language model.
It first exploits general texts to form queries for extracting commonsense knowledge from the neural commonsense knowledge model.
It then refines the language model with two self-supervised objectives: commonsense mask infilling and commonsense relation prediction.
arXiv Detail & Related papers (2023-06-04T15:44:51Z) - Preserving Knowledge Invariance: Rethinking Robustness Evaluation of
Open Information Extraction [50.62245481416744]
We present the first benchmark that simulates the evaluation of open information extraction models in the real world.
We design and annotate a large-scale testbed in which each example is a knowledge-invariant clique.
By further elaborating the robustness metric, a model is judged to be robust if its performance is consistently accurate on the overall cliques.
arXiv Detail & Related papers (2023-05-23T12:05:09Z) - Explaining Language Models' Predictions with High-Impact Concepts [11.47612457613113]
We propose a complete framework for extending concept-based interpretability methods to NLP.
We optimize for features whose existence causes the output predictions to change substantially.
Our method achieves superior results on predictive impact, usability, and faithfulness compared to the baselines.
arXiv Detail & Related papers (2023-05-03T14:48:27Z) - An Empirical Investigation of Commonsense Self-Supervision with
Knowledge Graphs [67.23285413610243]
Self-supervision based on the information extracted from large knowledge graphs has been shown to improve the generalization of language models.
We study the effect of knowledge sampling strategies and sizes that can be used to generate synthetic data for adapting language models.
arXiv Detail & Related papers (2022-05-21T19:49:04Z) - Generated Knowledge Prompting for Commonsense Reasoning [53.88983683513114]
We propose generating knowledge statements directly from a language model with a generic prompt format.
This approach improves performance of both off-the-shelf and finetuned language models on four commonsense reasoning tasks.
Notably, we find that a model's predictions can improve when using its own generated knowledge.
arXiv Detail & Related papers (2021-10-15T21:58:03Z) - Distilling Relation Embeddings from Pre-trained Language Models [35.718167335989854]
We show that it is possible to distill relation embeddings from pre-trained language models.
We encode word pairs using a (manually or automatically generated) prompt, and we fine-tune the language model.
The resulting relation embeddings are highly competitive on analogy (unsupervised) and relation classification (supervised) benchmarks.
arXiv Detail & Related papers (2021-09-21T15:05:27Z) - On the Lack of Robust Interpretability of Neural Text Classifiers [14.685352584216757]
We assess the robustness of interpretations of neural text classifiers based on pretrained Transformer encoders.
Both tests show surprising deviations from expected behavior, raising questions about the extent of insights that practitioners may draw from interpretations.
arXiv Detail & Related papers (2021-06-08T18:31:02Z) - Unnatural Language Inference [48.45003475966808]
We find that state-of-the-art NLI models, such as RoBERTa and BART, are invariant to, and sometimes even perform better on, examples with randomly reordered words.
Our findings call into question the idea that our natural language understanding models, and the tasks used for measuring their progress, genuinely require a human-like understanding of syntax.
arXiv Detail & Related papers (2020-12-30T20:40:48Z) - Are Some Words Worth More than Others? [3.5598388686985354]
We propose two new intrinsic evaluation measures within the framework of a simple word prediction task.
We evaluate several commonly-used large English language models using our proposed metrics.
arXiv Detail & Related papers (2020-10-12T23:12:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.