RelBERT: Embedding Relations with Language Models
- URL: http://arxiv.org/abs/2310.00299v2
- Date: Sun, 8 Oct 2023 16:22:26 GMT
- Title: RelBERT: Embedding Relations with Language Models
- Authors: Asahi Ushio, Jose Camacho-Collados, Steven Schockaert
- Abstract summary: We propose to extract relation embeddings from relatively small language models.
RelBERT captures relational similarity in a surprisingly fine-grained way.
It is capable of modelling relations that go well beyond what the model has seen during training.
- Score: 29.528217625083546
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many applications need access to background knowledge about how different
concepts and entities are related. Although Knowledge Graphs (KG) and Large
Language Models (LLM) can address this need to some extent, KGs are inevitably
incomplete and their relational schema is often too coarse-grained, while LLMs
are inefficient and difficult to control. As an alternative, we propose to
extract relation embeddings from relatively small language models. In
particular, we show that masked language models such as RoBERTa can be
straightforwardly fine-tuned for this purpose, using only a small amount of
training data. The resulting model, which we call RelBERT, captures relational
similarity in a surprisingly fine-grained way, allowing us to set a new
state-of-the-art in analogy benchmarks. Crucially, RelBERT is capable of
modelling relations that go well beyond what the model has seen during
training. For instance, we obtained strong results on relations between named
entities with a model that was only trained on lexical relations between
concepts, and we observed that RelBERT can recognise morphological analogies
despite not being trained on such examples. Overall, we find that RelBERT
significantly outperforms strategies based on prompting language models that
are several orders of magnitude larger, including recent GPT-based models and
open source models.
Related papers
- Exploring Model Kinship for Merging Large Language Models [52.01652098827454]
We introduce model kinship, the degree of similarity or relatedness between Large Language Models.
We find that there is a certain relationship between model kinship and the performance gains after model merging.
We propose a new model merging strategy: Top-k Greedy Merging with Model Kinship, which can yield better performance on benchmark datasets.
arXiv Detail & Related papers (2024-10-16T14:29:29Z) - Unlocking the Potential of Model Merging for Low-Resource Languages [66.7716891808697]
Adapting large language models to new languages typically involves continual pre-training (CT) followed by supervised fine-tuning (SFT)
We propose model merging as an alternative for low-resource languages, combining models with distinct capabilities into a single model without additional training.
Experiments based on Llama-2-7B demonstrate that model merging effectively endows LLMs for low-resource languages with task-solving abilities, outperforming CT-then-SFT in scenarios with extremely scarce data.
arXiv Detail & Related papers (2024-07-04T15:14:17Z) - "Medium" LMs of Code in the Era of LLMs: Lessons From StackOverflow [5.036273913335737]
We train two models: SOBertBase, with 109M parameters, and SOBertLarge with 762M parameters, at a budget of just $$187$ and $$800$ each.
Results demonstrate that pre-training both extensively and properly on in-domain data can yield a powerful and affordable alternative to leveraging closed-source general-purpose models.
arXiv Detail & Related papers (2023-06-05T21:38:30Z) - A RelEntLess Benchmark for Modelling Graded Relations between Named
Entities [29.528217625083546]
We introduce a new benchmark, in which entity pairs have to be ranked according to how much they satisfy a given graded relation.
We find a strong correlation between model size and performance, with smaller Language Models struggling to outperform a naive baseline.
The results of the largest Flan-T5 and OPT models are remarkably strong, although a clear gap with human performance remains.
arXiv Detail & Related papers (2023-05-24T10:41:24Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - Entity-Assisted Language Models for Identifying Check-worthy Sentences [23.792877053142636]
We propose a new uniform framework for text classification and ranking.
Our framework combines the semantic analysis of the sentences, with additional entity embeddings obtained through the identified entities within the sentences.
We extensively evaluate the effectiveness of our framework using two publicly available datasets from the CLEF's 2019 & 2020 CheckThat! Labs.
arXiv Detail & Related papers (2022-11-19T12:03:30Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - Representing Knowledge by Spans: A Knowledge-Enhanced Model for
Information Extraction [7.077412533545456]
We propose a new pre-trained model that learns representations of both entities and relationships simultaneously.
By encoding spans efficiently with span modules, our model can represent both entities and their relationships but requires fewer parameters than existing models.
arXiv Detail & Related papers (2022-08-20T07:32:25Z) - Language Model Cascades [72.18809575261498]
Repeated interactions at test-time with a single model, or the composition of multiple models together, further expands capabilities.
Cases with control flow and dynamic structure require techniques from probabilistic programming.
We formalize several existing techniques from this perspective, including scratchpads / chain of thought, verifiers, STaR, selection-inference, and tool use.
arXiv Detail & Related papers (2022-07-21T07:35:18Z) - Interpreting Language Models Through Knowledge Graph Extraction [42.97929497661778]
We compare BERT-based language models through snapshots of acquired knowledge at sequential stages of the training process.
We present a methodology to unveil a knowledge acquisition timeline by generating knowledge graph extracts from cloze "fill-in-the-blank" statements.
We extend this analysis to a comparison of pretrained variations of BERT models (DistilBERT, BERT-base, RoBERTa)
arXiv Detail & Related papers (2021-11-16T15:18:01Z) - Relating by Contrasting: A Data-efficient Framework for Multimodal
Generative Models [86.9292779620645]
We develop a contrastive framework for generative model learning, allowing us to train the model not just by the commonality between modalities, but by the distinction between "related" and "unrelated" multimodal data.
Under our proposed framework, the generative model can accurately identify related samples from unrelated ones, making it possible to make use of the plentiful unlabeled, unpaired multimodal data.
arXiv Detail & Related papers (2020-07-02T15:08:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.