Linking Surface Facts to Large-Scale Knowledge Graphs
- URL: http://arxiv.org/abs/2310.14909v1
- Date: Mon, 23 Oct 2023 13:18:49 GMT
- Title: Linking Surface Facts to Large-Scale Knowledge Graphs
- Authors: Gorjan Radevski, Kiril Gashteovski, Chia-Chien Hung, Carolin Lawrence,
Goran Glava\v{s}
- Abstract summary: Open Information Extraction (OIE) methods extract facts from natural language text in the form of ("subject"; "relation"; "object") triples.
Knowledge Graphs (KGs) contain facts in a canonical (i.e., unambiguous) form, but their coverage is limited by a static schema.
We propose a new benchmark with novel evaluation protocols that can, for example, measure fact linking performance on a granular triple slot level.
- Score: 23.380979397966286
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Open Information Extraction (OIE) methods extract facts from natural language
text in the form of ("subject"; "relation"; "object") triples. These facts are,
however, merely surface forms, the ambiguity of which impedes their downstream
usage; e.g., the surface phrase "Michael Jordan" may refer to either the former
basketball player or the university professor. Knowledge Graphs (KGs), on the
other hand, contain facts in a canonical (i.e., unambiguous) form, but their
coverage is limited by a static schema (i.e., a fixed set of entities and
predicates). To bridge this gap, we need the best of both worlds: (i) high
coverage of free-text OIEs, and (ii) semantic precision (i.e., monosemy) of
KGs. In order to achieve this goal, we propose a new benchmark with novel
evaluation protocols that can, for example, measure fact linking performance on
a granular triple slot level, while also measuring if a system has the ability
to recognize that a surface form has no match in the existing KG. Our extensive
evaluation of several baselines show that detection of out-of-KG entities and
predicates is more difficult than accurate linking to existing ones, thus
calling for more research efforts on this difficult task. We publicly release
all resources (data, benchmark and code) on
https://github.com/nec-research/fact-linking.
Related papers
- EntailE: Introducing Textual Entailment in Commonsense Knowledge Graph
Completion [54.12709176438264]
Commonsense knowledge graphs (CSKGs) utilize free-form text to represent named entities, short phrases, and events as their nodes.
Current methods leverage semantic similarities to increase the graph density, but the semantic plausibility of the nodes and their relations are under-explored.
We propose to adopt textual entailment to find implicit entailment relations between CSKG nodes, to effectively densify the subgraph connecting nodes within the same conceptual class.
arXiv Detail & Related papers (2024-02-15T02:27:23Z) - Text-To-KG Alignment: Comparing Current Methods on Classification Tasks [2.191505742658975]
knowledge graphs (KG) provide dense and structured representations of factual information.
Recent work has focused on creating pipeline models that retrieve information from KGs as additional context.
It is not known how current methods compare to a scenario where the aligned subgraph is completely relevant to the query.
arXiv Detail & Related papers (2023-06-05T13:45:45Z) - Text-Augmented Open Knowledge Graph Completion via Pre-Trained Language
Models [53.09723678623779]
We propose TAGREAL to automatically generate quality query prompts and retrieve support information from large text corpora.
The results show that TAGREAL achieves state-of-the-art performance on two benchmark datasets.
We find that TAGREAL has superb performance even with limited training data, outperforming existing embedding-based, graph-based, and PLM-based methods.
arXiv Detail & Related papers (2023-05-24T22:09:35Z) - ConGraT: Self-Supervised Contrastive Pretraining for Joint Graph and Text Embeddings [20.25180279903009]
We propose Contrastive Graph-Text pretraining (ConGraT) for jointly learning separate representations of texts and nodes in a text-attributed graph (TAG)
Our method trains a language model (LM) and a graph neural network (GNN) to align their representations in a common latent space using a batch-wise contrastive learning objective inspired by CLIP.
Experiments demonstrate that ConGraT outperforms baselines on various downstream tasks, including node and text category classification, link prediction, and language modeling.
arXiv Detail & Related papers (2023-05-23T17:53:30Z) - Joint Language Semantic and Structure Embedding for Knowledge Graph
Completion [66.15933600765835]
We propose to jointly embed the semantics in the natural language description of the knowledge triplets with their structure information.
Our method embeds knowledge graphs for the completion task via fine-tuning pre-trained language models.
Our experiments on a variety of knowledge graph benchmarks have demonstrated the state-of-the-art performance of our method.
arXiv Detail & Related papers (2022-09-19T02:41:02Z) - SKILL: Structured Knowledge Infusion for Large Language Models [46.34209061364282]
We propose a method to infuse structured knowledge into large language models (LLMs)
We show that models pre-trained on Wikidata KG with our method outperform the T5 baselines on FreebaseQA and WikiHop.
We saw 3x improvement of exact match score on MetaQA task compared to T5 baseline.
arXiv Detail & Related papers (2022-05-17T09:12:22Z) - Trustworthy Knowledge Graph Completion Based on Multi-sourced Noisy Data [35.938323660176145]
We propose a new trustworthy method that exploits facts for a knowledge graph based on multi-sourced noisy data and existing facts in the KG.
Specifically, we introduce a graph neural network with a holistic scoring function to judge the plausibility of facts with various value types.
We present a truth inference model that incorporates data source qualities into the fact scoring function, and design a semi-supervised learning way to infer the truths from heterogeneous values.
arXiv Detail & Related papers (2022-01-21T07:59:16Z) - KILT: a Benchmark for Knowledge Intensive Language Tasks [102.33046195554886]
We present a benchmark for knowledge-intensive language tasks (KILT)
All tasks in KILT are grounded in the same snapshot of Wikipedia.
We find that a shared dense vector index coupled with a seq2seq model is a strong baseline.
arXiv Detail & Related papers (2020-09-04T15:32:19Z) - Efficient Knowledge Graph Validation via Cross-Graph Representation
Learning [40.570585195713704]
noisy facts are unavoidably introduced into Knowledge Graphs that could be caused by automatic extraction.
We propose a cross-graph representation learning framework, i.e., CrossVal, which can leverage an external KG to validate the facts in the target KG efficiently.
arXiv Detail & Related papers (2020-08-16T20:51:17Z) - Structure-Augmented Text Representation Learning for Efficient Knowledge
Graph Completion [53.31911669146451]
Human-curated knowledge graphs provide critical supportive information to various natural language processing tasks.
These graphs are usually incomplete, urging auto-completion of them.
graph embedding approaches, e.g., TransE, learn structured knowledge via representing graph elements into dense embeddings.
textual encoding approaches, e.g., KG-BERT, resort to graph triple's text and triple-level contextualized representations.
arXiv Detail & Related papers (2020-04-30T13:50:34Z) - Generative Adversarial Zero-Shot Relational Learning for Knowledge
Graphs [96.73259297063619]
We consider a novel formulation, zero-shot learning, to free this cumbersome curation.
For newly-added relations, we attempt to learn their semantic features from their text descriptions.
We leverage Generative Adrial Networks (GANs) to establish the connection between text and knowledge graph domain.
arXiv Detail & Related papers (2020-01-08T01:19:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.