Related papers: COMBO: A Complete Benchmark for Open KG Canonicalization

COMBO: A Complete Benchmark for Open KG Canonicalization

URL: http://arxiv.org/abs/2302.03905v1
Date: Wed, 8 Feb 2023 06:46:01 GMT
Title: COMBO: A Complete Benchmark for Open KG Canonicalization
Authors: Chengyue Jiang, Yong Jiang, Weiqi Wu, Yuting Zheng, Pengjun Xie, Kewei Tu
Abstract summary: Open knowledge graph (KG) consists of (subject, relation, object) triples extracted from millions of raw text. Subject and object noun phrases and the relation in open KG have severe redundancy and ambiguity and need to be canonicalized. We present COMBO, a Complete Benchmark for Open KG canonicalization.
Score: 44.01719343528974
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Open knowledge graph (KG) consists of (subject, relation, object) triples extracted from millions of raw text. The subject and object noun phrases and the relation in open KG have severe redundancy and ambiguity and need to be canonicalized. Existing datasets for open KG canonicalization only provide gold entity-level canonicalization for noun phrases. In this paper, we present COMBO, a Complete Benchmark for Open KG canonicalization. Compared with existing datasets, we additionally provide gold canonicalization for relation phrases, gold ontology-level canonicalization for noun phrases, as well as source sentences from which triples are extracted. We also propose metrics for evaluating each type of canonicalization. On the COMBO dataset, we empirically compare previously proposed canonicalization methods as well as a few simple baseline methods based on pretrained language models. We find that properly encoding the phrases in a triple using pretrained language models results in better relation canonicalization and ontology-level canonicalization of the noun phrase. We release our dataset, baselines, and evaluation scripts at https://github.com/jeffchy/COMBO/tree/main.

Related papers

Learning Rules from KGs Guided by Language Models [48.858741745144044]
Rule learning methods can be applied to predict potentially missing facts. Ranking of rules is especially challenging over highly incomplete or biased KGs. With the recent rise of Language Models (LMs) several works have claimed that LMs can be used as alternative means for KG completion.
arXiv Detail & Related papers (2024-09-12T09:27:36Z)
Knowledge Graph Completion using Structural and Textual Embeddings [0.0]
We propose a relations prediction model that harnesses both textual and structural information within Knowledge Graphs. Our approach integrates walks-based embeddings with language model embeddings to effectively represent nodes. We demonstrate that our model achieves competitive results in the relation prediction task when evaluated on a widely used dataset.
arXiv Detail & Related papers (2024-04-24T21:04:14Z)
Joint Open Knowledge Base Canonicalization and Linking [24.160755953937763]
noun phrases (NPs) and relation phrases (RPs) in Open Knowledge Bases are not canonicalized. We propose a novel framework JOCL based on factor graph model to make them reinforce each other.
arXiv Detail & Related papers (2022-12-02T14:38:58Z)
Clustering Semantic Predicates in the Open Research Knowledge Graph [0.0]
We describe our approach tailoring two AI-based clustering algorithms for recommending predicates about resources in the Open Research Knowledge Graph (ORKG) Our experiments show very promising results: a high precision with relatively high recall in linear runtime performance. This work offers novel insights into the predicate groups that automatically accrue loosely as generic semantification patterns for semantification of scholarly knowledge spanning 44 research fields.
arXiv Detail & Related papers (2022-10-05T05:48:39Z)
KGxBoard: Explainable and Interactive Leaderboard for Evaluation of Knowledge Graph Completion Models [76.01814380927507]
KGxBoard is an interactive framework for performing fine-grained evaluation on meaningful subsets of the data. In our experiments, we highlight the findings with the use of KGxBoard, which would have been impossible to detect with standard averaged single-score metrics.
arXiv Detail & Related papers (2022-08-23T15:11:45Z)
Multi-View Clustering for Open Knowledge Base Canonicalization [9.976636206355394]
Noun phrases and relation phrases in large open knowledge bases (OKBs) are not canonicalized. We propose CMVC, a novel unsupervised framework that leverages two views of knowledge jointly for canonicalizing OKBs. We demonstrate the superiority of our framework through extensive experiments on multiple real-world OKB data sets against state-of-the-art methods.
arXiv Detail & Related papers (2022-06-22T14:23:16Z)
MEKER: Memory Efficient Knowledge Embedding Representation for Link Prediction and Question Answering [65.62309538202771]
Knowledge Graphs (KGs) are symbolically structured storages of facts. KG embedding contains concise data used in NLP tasks requiring implicit information about the real world. We propose a memory-efficient KG embedding model, which yields SOTA-comparable performance on link prediction tasks and KG-based Question Answering.
arXiv Detail & Related papers (2022-04-22T10:47:03Z)
More Than Words: Collocation Tokenization for Latent Dirichlet Allocation Models [71.42030830910227]
We propose a new metric for measuring the clustering quality in settings where the models differ. We show that topics trained with merged tokens result in topic keys that are clearer, more coherent, and more effective at distinguishing topics than those unmerged models.
arXiv Detail & Related papers (2021-08-24T14:08:19Z)
Joint Entity and Relation Canonicalization in Open Knowledge Graphs using Variational Autoencoders [11.259587284318835]
Noun phrases and relation phrases in open knowledge graphs are not canonicalized, leading to an explosion of redundant and ambiguous subject-relation-object triples. Existing approaches to face this problem take a two-step approach: first, they generate embedding representations for both noun and relation phrases, then a clustering algorithm is used to group them using the embeddings as features. In this work, we propose Canonicalizing Using Variational AutoEncoders (CUVA), a joint model to learn both embeddings and cluster assignments in an end-to-end approach.
arXiv Detail & Related papers (2020-12-08T22:58:30Z)
Graph Structured Network for Image-Text Matching [127.68148793548116]
We present a novel Graph Structured Matching Network to learn fine-grained correspondence. The GSMN explicitly models object, relation and attribute as a structured phrase. Experiments show that GSMN outperforms state-of-the-art methods on benchmarks.
arXiv Detail & Related papers (2020-04-01T08:20:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.