Joint Entity and Relation Canonicalization in Open Knowledge Graphs
using Variational Autoencoders
- URL: http://arxiv.org/abs/2012.04780v1
- Date: Tue, 8 Dec 2020 22:58:30 GMT
- Title: Joint Entity and Relation Canonicalization in Open Knowledge Graphs
using Variational Autoencoders
- Authors: Sarthak Dash, Gaetano Rossiello, Nandana Mihindukulasooriya, Sugato
Bagchi, Alfio Gliozzo
- Abstract summary: Noun phrases and relation phrases in open knowledge graphs are not canonicalized, leading to an explosion of redundant and ambiguous subject-relation-object triples.
Existing approaches to face this problem take a two-step approach: first, they generate embedding representations for both noun and relation phrases, then a clustering algorithm is used to group them using the embeddings as features.
In this work, we propose Canonicalizing Using Variational AutoEncoders (CUVA), a joint model to learn both embeddings and cluster assignments in an end-to-end approach.
- Score: 11.259587284318835
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Noun phrases and relation phrases in open knowledge graphs are not
canonicalized, leading to an explosion of redundant and ambiguous
subject-relation-object triples. Existing approaches to face this problem take
a two-step approach: first, they generate embedding representations for both
noun and relation phrases, then a clustering algorithm is used to group them
using the embeddings as features. In this work, we propose Canonicalizing Using
Variational AutoEncoders (CUVA), a joint model to learn both embeddings and
cluster assignments in an end-to-end approach, which leads to a better vector
representation for the noun and relation phrases. Our evaluation over multiple
benchmarks shows that CUVA outperforms the existing state of the art
approaches. Moreover, we introduce CanonicNell a novel dataset to evaluate
entity canonicalization systems.
Related papers
- Self Supervised Correlation-based Permutations for Multi-View Clustering [7.972599673048582]
We propose an end-to-end deep learning-based MVC framework for general data.
Our approach involves learning meaningful fused data representations with a novel permutation-based canonical correlation objective.
We demonstrate the effectiveness of our model using ten MVC benchmark datasets.
arXiv Detail & Related papers (2024-02-26T08:08:30Z) - Contextual Dictionary Lookup for Knowledge Graph Completion [32.493168863565465]
Knowledge graph completion (KGC) aims to solve the incompleteness of knowledge graphs (KGs) by predicting missing links from known triples.
Most existing embedding models map each relation into a unique vector, overlooking the specific fine-grained semantics of them under different entities.
We present a novel method utilizing contextual dictionary lookup, enabling conventional embedding models to learn fine-grained semantics of relations in an end-to-end manner.
arXiv Detail & Related papers (2023-06-13T12:13:41Z) - Mutual Exclusivity Training and Primitive Augmentation to Induce
Compositionality [84.94877848357896]
Recent datasets expose the lack of the systematic generalization ability in standard sequence-to-sequence models.
We analyze this behavior of seq2seq models and identify two contributing factors: a lack of mutual exclusivity bias and the tendency to memorize whole examples.
We show substantial empirical improvements using standard sequence-to-sequence models on two widely-used compositionality datasets.
arXiv Detail & Related papers (2022-11-28T17:36:41Z) - Unified Multi-View Orthonormal Non-Negative Graph Based Clustering
Framework [74.25493157757943]
We formulate a novel clustering model, which exploits the non-negative feature property and incorporates the multi-view information into a unified joint learning framework.
We also explore, for the first time, the multi-model non-negative graph-based approach to clustering data based on deep features.
arXiv Detail & Related papers (2022-11-03T08:18:27Z) - Repurposing Knowledge Graph Embeddings for Triple Representation via
Weak Supervision [77.34726150561087]
Current methods learn triple embeddings from scratch without utilizing entity and predicate embeddings from pre-trained models.
We develop a method for automatically sampling triples from a knowledge graph and estimating their pairwise similarities from pre-trained embedding models.
These pairwise similarity scores are then fed to a Siamese-like neural architecture to fine-tune triple representations.
arXiv Detail & Related papers (2022-08-22T14:07:08Z) - Multi-View Clustering for Open Knowledge Base Canonicalization [9.976636206355394]
Noun phrases and relation phrases in large open knowledge bases (OKBs) are not canonicalized.
We propose CMVC, a novel unsupervised framework that leverages two views of knowledge jointly for canonicalizing OKBs.
We demonstrate the superiority of our framework through extensive experiments on multiple real-world OKB data sets against state-of-the-art methods.
arXiv Detail & Related papers (2022-06-22T14:23:16Z) - Event-Driven News Stream Clustering using Entity-Aware Contextual
Embeddings [14.225334321146779]
We propose a method for online news stream clustering that is a variant of the non-parametric streaming K-means algorithm.
Our model uses a combination of sparse and dense document representations, aggregates document-cluster similarity along these multiple representations.
We show that the use of a suitable fine-tuning objective and external knowledge in pre-trained transformer models yields significant improvements in the effectiveness of contextual embeddings.
arXiv Detail & Related papers (2021-01-26T19:58:30Z) - Keyphrase Extraction with Dynamic Graph Convolutional Networks and
Diversified Inference [50.768682650658384]
Keyphrase extraction (KE) aims to summarize a set of phrases that accurately express a concept or a topic covered in a given document.
Recent Sequence-to-Sequence (Seq2Seq) based generative framework is widely used in KE task, and it has obtained competitive performance on various benchmarks.
In this paper, we propose to adopt the Dynamic Graph Convolutional Networks (DGCN) to solve the above two problems simultaneously.
arXiv Detail & Related papers (2020-10-24T08:11:23Z) - Clustering-based Unsupervised Generative Relation Extraction [3.342376225738321]
We propose a Clustering-based Unsupervised generative Relation Extraction framework (CURE)
We use an "Encoder-Decoder" architecture to perform self-supervised learning so the encoder can extract relation information.
Our model performs better than state-of-the-art models on both New York Times (NYT) and United Nations Parallel Corpus (UNPC) standard datasets.
arXiv Detail & Related papers (2020-09-26T20:36:40Z) - Dual Adversarial Auto-Encoders for Clustering [152.84443014554745]
We propose Dual Adversarial Auto-encoder (Dual-AAE) for unsupervised clustering.
By performing variational inference on the objective function of Dual-AAE, we derive a new reconstruction loss which can be optimized by training a pair of Auto-encoders.
Experiments on four benchmarks show that Dual-AAE achieves superior performance over state-of-the-art clustering methods.
arXiv Detail & Related papers (2020-08-23T13:16:34Z) - Active Learning for Coreference Resolution using Discrete Annotation [76.36423696634584]
We improve upon pairwise annotation for active learning in coreference resolution.
We ask annotators to identify mention antecedents if a presented mention pair is deemed not coreferent.
In experiments with existing benchmark coreference datasets, we show that the signal from this additional question leads to significant performance gains per human-annotation hour.
arXiv Detail & Related papers (2020-04-28T17:17:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.