Related papers: Distilling Relation Embeddings from Pre-trained Language Models

Distilling Relation Embeddings from Pre-trained Language Models

URL: http://arxiv.org/abs/2110.15705v1
Date: Tue, 21 Sep 2021 15:05:27 GMT
Title: Distilling Relation Embeddings from Pre-trained Language Models
Authors: Asahi Ushio and Jose Camacho-Collados and Steven Schockaert
Abstract summary: We show that it is possible to distill relation embeddings from pre-trained language models. We encode word pairs using a (manually or automatically generated) prompt, and we fine-tune the language model. The resulting relation embeddings are highly competitive on analogy (unsupervised) and relation classification (supervised) benchmarks.
Score: 35.718167335989854
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Pre-trained language models have been found to capture a surprisingly rich amount of lexical knowledge, ranging from commonsense properties of everyday concepts to detailed factual knowledge about named entities. Among others, this makes it possible to distill high-quality word vectors from pre-trained language models. However, it is currently unclear to what extent it is possible to distill relation embeddings, i.e. vectors that characterize the relationship between two words. Such relation embeddings are appealing because they can, in principle, encode relational knowledge in a more fine-grained way than is possible with knowledge graphs. To obtain relation embeddings from a pre-trained language model, we encode word pairs using a (manually or automatically generated) prompt, and we fine-tune the language model such that relationally similar word pairs yield similar output vectors. We find that the resulting relation embeddings are highly competitive on analogy (unsupervised) and relation classification (supervised) benchmarks, even without any task-specific fine-tuning. Source code to reproduce our experimental results and the model checkpoints are available in the following repository: https://github.com/asahi417/relbert

Related papers

A Distributional Perspective on Word Learning in Neural Language Models [57.41607944290822]
There are no widely agreed-upon metrics for word learning in language models. We argue that distributional signatures studied in prior work fail to capture key distributional information. We obtain learning trajectories for a selection of small language models we train from scratch.
arXiv Detail & Related papers (2025-02-09T13:15:59Z)
Transparency at the Source: Evaluating and Interpreting Language Models With Access to the True Distribution [4.01799362940916]
We present a setup for training, evaluating and interpreting neural language models, that uses artificial, language-like data. The data is generated using a massive probabilistic grammar, that is itself derived from a large natural language corpus. With access to the underlying true source, our results show striking differences and outcomes in learning dynamics between different classes of words.
arXiv Detail & Related papers (2023-10-23T12:03:01Z)
Relational Sentence Embedding for Flexible Semantic Matching [86.21393054423355]
We present Sentence Embedding (RSE), a new paradigm to discover further the potential of sentence embeddings. RSE is effective and flexible in modeling sentence relations and outperforms a series of state-of-the-art embedding methods.
arXiv Detail & Related papers (2022-12-17T05:25:17Z)
Modelling Commonsense Properties using Pre-Trained Bi-Encoders [40.327695801431375]
We study the possibility of fine-tuning language models to explicitly model concepts and their properties. Our experimental results show that the resulting encoders allow us to predict commonsense properties with much higher accuracy than is possible.
arXiv Detail & Related papers (2022-10-06T09:17:34Z)
Towards a Theoretical Understanding of Word and Relation Representation [8.020742121274418]
Representing words by vectors, or embeddings, enables computational reasoning. We focus on word embeddings learned from text corpora and knowledge graphs.
arXiv Detail & Related papers (2022-02-01T15:34:58Z)
BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies? [35.381345454627]
We analyze the capabilities of transformer-based language models on an unsupervised task of identifying analogies. Off-the-shelf language models can identify analogies to a certain extent, but struggle with abstract and complex relations. Our results raise important questions for future work about how, and to what extent, pre-trained language models capture knowledge about abstract semantic relations.
arXiv Detail & Related papers (2021-05-11T11:38:49Z)
Paraphrastic Representations at Scale [134.41025103489224]
We release trained models for English, Arabic, German, French, Spanish, Russian, Turkish, and Chinese languages. We train these models on large amounts of data, achieving significantly improved performance from the original papers.
arXiv Detail & Related papers (2021-04-30T16:55:28Z)
Prototypical Representation Learning for Relation Extraction [56.501332067073065]
This paper aims to learn predictive, interpretable, and robust relation representations from distantly-labeled data. We learn prototypes for each relation from contextual information to best explore the intrinsic semantics of relations. Results on several relation learning tasks show that our model significantly outperforms the previous state-of-the-art relational models.
arXiv Detail & Related papers (2021-03-22T08:11:43Z)
Unnatural Language Inference [48.45003475966808]
We find that state-of-the-art NLI models, such as RoBERTa and BART, are invariant to, and sometimes even perform better on, examples with randomly reordered words. Our findings call into question the idea that our natural language understanding models, and the tasks used for measuring their progress, genuinely require a human-like understanding of syntax.
arXiv Detail & Related papers (2020-12-30T20:40:48Z)
Fusing Context Into Knowledge Graph for Commonsense Reasoning [21.33294077354958]
We propose to utilize external entity description to provide contextual information for graph entities. For the CommonsenseQA task, our model first extracts concepts from the question and choice, and then finds a related triple between these concepts. We achieve state-of-the-art results in the CommonsenseQA dataset with an accuracy of 80.7% (single model) and 83.3% (ensemble model) on the official leaderboard.
arXiv Detail & Related papers (2020-12-09T00:57:49Z)
Learning Relation Prototype from Unlabeled Texts for Long-tail Relation Extraction [84.64435075778988]
We propose a general approach to learn relation prototypes from unlabeled texts. We learn relation prototypes as an implicit factor between entities. We conduct experiments on two publicly available datasets: New York Times and Google Distant Supervision.
arXiv Detail & Related papers (2020-11-27T06:21:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.