COMBO: A Complete Benchmark for Open KG Canonicalization
- URL: http://arxiv.org/abs/2302.03905v1
- Date: Wed, 8 Feb 2023 06:46:01 GMT
- Title: COMBO: A Complete Benchmark for Open KG Canonicalization
- Authors: Chengyue Jiang, Yong Jiang, Weiqi Wu, Yuting Zheng, Pengjun Xie, Kewei
Tu
- Abstract summary: Open knowledge graph (KG) consists of (subject, relation, object) triples extracted from millions of raw text.
Subject and object noun phrases and the relation in open KG have severe redundancy and ambiguity and need to be canonicalized.
We present COMBO, a Complete Benchmark for Open KG canonicalization.
- Score: 44.01719343528974
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Open knowledge graph (KG) consists of (subject, relation, object) triples
extracted from millions of raw text. The subject and object noun phrases and
the relation in open KG have severe redundancy and ambiguity and need to be
canonicalized. Existing datasets for open KG canonicalization only provide gold
entity-level canonicalization for noun phrases. In this paper, we present
COMBO, a Complete Benchmark for Open KG canonicalization. Compared with
existing datasets, we additionally provide gold canonicalization for relation
phrases, gold ontology-level canonicalization for noun phrases, as well as
source sentences from which triples are extracted. We also propose metrics for
evaluating each type of canonicalization. On the COMBO dataset, we empirically
compare previously proposed canonicalization methods as well as a few simple
baseline methods based on pretrained language models. We find that properly
encoding the phrases in a triple using pretrained language models results in
better relation canonicalization and ontology-level canonicalization of the
noun phrase. We release our dataset, baselines, and evaluation scripts at
https://github.com/jeffchy/COMBO/tree/main.
Related papers
- Knowledge Graph Completion using Structural and Textual Embeddings [0.0]
We propose a relations prediction model that harnesses both textual and structural information within Knowledge Graphs.
Our approach integrates walks-based embeddings with language model embeddings to effectively represent nodes.
We demonstrate that our model achieves competitive results in the relation prediction task when evaluated on a widely used dataset.
arXiv Detail & Related papers (2024-04-24T21:04:14Z) - Joint Open Knowledge Base Canonicalization and Linking [24.160755953937763]
noun phrases (NPs) and relation phrases (RPs) in Open Knowledge Bases are not canonicalized.
We propose a novel framework JOCL based on factor graph model to make them reinforce each other.
arXiv Detail & Related papers (2022-12-02T14:38:58Z) - Clustering Semantic Predicates in the Open Research Knowledge Graph [0.0]
We describe our approach tailoring two AI-based clustering algorithms for recommending predicates about resources in the Open Research Knowledge Graph (ORKG)
Our experiments show very promising results: a high precision with relatively high recall in linear runtime performance.
This work offers novel insights into the predicate groups that automatically accrue loosely as generic semantification patterns for semantification of scholarly knowledge spanning 44 research fields.
arXiv Detail & Related papers (2022-10-05T05:48:39Z) - KGxBoard: Explainable and Interactive Leaderboard for Evaluation of
Knowledge Graph Completion Models [76.01814380927507]
KGxBoard is an interactive framework for performing fine-grained evaluation on meaningful subsets of the data.
In our experiments, we highlight the findings with the use of KGxBoard, which would have been impossible to detect with standard averaged single-score metrics.
arXiv Detail & Related papers (2022-08-23T15:11:45Z) - Multi-View Clustering for Open Knowledge Base Canonicalization [9.976636206355394]
Noun phrases and relation phrases in large open knowledge bases (OKBs) are not canonicalized.
We propose CMVC, a novel unsupervised framework that leverages two views of knowledge jointly for canonicalizing OKBs.
We demonstrate the superiority of our framework through extensive experiments on multiple real-world OKB data sets against state-of-the-art methods.
arXiv Detail & Related papers (2022-06-22T14:23:16Z) - MEKER: Memory Efficient Knowledge Embedding Representation for Link
Prediction and Question Answering [65.62309538202771]
Knowledge Graphs (KGs) are symbolically structured storages of facts.
KG embedding contains concise data used in NLP tasks requiring implicit information about the real world.
We propose a memory-efficient KG embedding model, which yields SOTA-comparable performance on link prediction tasks and KG-based Question Answering.
arXiv Detail & Related papers (2022-04-22T10:47:03Z) - More Than Words: Collocation Tokenization for Latent Dirichlet
Allocation Models [71.42030830910227]
We propose a new metric for measuring the clustering quality in settings where the models differ.
We show that topics trained with merged tokens result in topic keys that are clearer, more coherent, and more effective at distinguishing topics than those unmerged models.
arXiv Detail & Related papers (2021-08-24T14:08:19Z) - Joint Entity and Relation Canonicalization in Open Knowledge Graphs
using Variational Autoencoders [11.259587284318835]
Noun phrases and relation phrases in open knowledge graphs are not canonicalized, leading to an explosion of redundant and ambiguous subject-relation-object triples.
Existing approaches to face this problem take a two-step approach: first, they generate embedding representations for both noun and relation phrases, then a clustering algorithm is used to group them using the embeddings as features.
In this work, we propose Canonicalizing Using Variational AutoEncoders (CUVA), a joint model to learn both embeddings and cluster assignments in an end-to-end approach.
arXiv Detail & Related papers (2020-12-08T22:58:30Z) - Exploring and Evaluating Attributes, Values, and Structures for Entity
Alignment [100.19568734815732]
Entity alignment (EA) aims at building a unified Knowledge Graph (KG) of rich content by linking the equivalent entities from various KGs.
attribute triples can also provide crucial alignment signal but have not been well explored yet.
We propose to utilize an attributed value encoder and partition the KG into subgraphs to model the various types of attribute triples efficiently.
arXiv Detail & Related papers (2020-10-07T08:03:58Z) - Graph Structured Network for Image-Text Matching [127.68148793548116]
We present a novel Graph Structured Matching Network to learn fine-grained correspondence.
The GSMN explicitly models object, relation and attribute as a structured phrase.
Experiments show that GSMN outperforms state-of-the-art methods on benchmarks.
arXiv Detail & Related papers (2020-04-01T08:20:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.