Multi-View Clustering for Open Knowledge Base Canonicalization
- URL: http://arxiv.org/abs/2206.11130v1
- Date: Wed, 22 Jun 2022 14:23:16 GMT
- Title: Multi-View Clustering for Open Knowledge Base Canonicalization
- Authors: Wei Shen, Yang Yang, Yinan Liu
- Abstract summary: Noun phrases and relation phrases in large open knowledge bases (OKBs) are not canonicalized.
We propose CMVC, a novel unsupervised framework that leverages two views of knowledge jointly for canonicalizing OKBs.
We demonstrate the superiority of our framework through extensive experiments on multiple real-world OKB data sets against state-of-the-art methods.
- Score: 9.976636206355394
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Open information extraction (OIE) methods extract plenty of OIE triples <noun
phrase, relation phrase, noun phrase> from unstructured text, which compose
large open knowledge bases (OKBs). Noun phrases and relation phrases in such
OKBs are not canonicalized, which leads to scattered and redundant facts. It is
found that two views of knowledge (i.e., a fact view based on the fact triple
and a context view based on the fact triple's source context) provide
complementary information that is vital to the task of OKB canonicalization,
which clusters synonymous noun phrases and relation phrases into the same group
and assigns them unique identifiers. However, these two views of knowledge have
so far been leveraged in isolation by existing works. In this paper, we propose
CMVC, a novel unsupervised framework that leverages these two views of
knowledge jointly for canonicalizing OKBs without the need of manually
annotated labels. To achieve this goal, we propose a multi-view CH K-Means
clustering algorithm to mutually reinforce the clustering of view-specific
embeddings learned from each view by considering their different clustering
qualities. In order to further enhance the canonicalization performance, we
propose a training data optimization strategy in terms of data quantity and
data quality respectively in each particular view to refine the learned
view-specific embeddings in an iterative manner. Additionally, we propose a
Log-Jump algorithm to predict the optimal number of clusters in a data-driven
way without requiring any labels. We demonstrate the superiority of our
framework through extensive experiments on multiple real-world OKB data sets
against state-of-the-art methods.
Related papers
- Discriminative Anchor Learning for Efficient Multi-view Clustering [59.11406089896875]
We propose discriminative anchor learning for multi-view clustering (DALMC)
We learn discriminative view-specific feature representations according to the original dataset.
We build anchors from different views based on these representations, which increase the quality of the shared anchor graph.
arXiv Detail & Related papers (2024-09-25T13:11:17Z) - Open Knowledge Base Canonicalization with Multi-task Learning [18.053863554106307]
Large open knowledge bases (OKBs) are integral to many knowledge-driven applications on the world wide web such as web search.
noun phrases and relational phrases in OKBs often suffer from redundancy and ambiguity, which calls for the investigation on OKB canonicalization.
Current solutions address OKB canonicalization by devising advanced clustering algorithms and using knowledge graph embedding (KGE) to further facilitate the canonicalization process.
We put forward a multi-task learning framework, namely MulCanon, to tackle OKB canonicalization.
arXiv Detail & Related papers (2024-03-21T08:03:46Z) - Open-Vocabulary Camouflaged Object Segmentation [66.94945066779988]
We introduce a new task, open-vocabulary camouflaged object segmentation (OVCOS)
We construct a large-scale complex scene dataset (textbfOVCamo) containing 11,483 hand-selected images with fine annotations and corresponding object classes.
By integrating the guidance of class semantic knowledge and the supplement of visual structure cues from the edge and depth information, the proposed method can efficiently capture camouflaged objects.
arXiv Detail & Related papers (2023-11-19T06:00:39Z) - Generalized Category Discovery with Clustering Assignment Consistency [56.92546133591019]
Generalized category discovery (GCD) is a recently proposed open-world task.
We propose a co-training-based framework that encourages clustering consistency.
Our method achieves state-of-the-art performance on three generic benchmarks and three fine-grained visual recognition datasets.
arXiv Detail & Related papers (2023-10-30T00:32:47Z) - Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic
Segmentation [59.37587762543934]
This paper studies the problem of weakly open-vocabulary semantic segmentation (WOVSS)
Existing methods suffer from a granularity inconsistency regarding the usage of group tokens.
We propose the prototypical guidance network (PGSeg) that incorporates multi-modal regularization.
arXiv Detail & Related papers (2023-10-29T13:18:00Z) - Towards Realistic Zero-Shot Classification via Self Structural Semantic
Alignment [53.2701026843921]
Large-scale pre-trained Vision Language Models (VLMs) have proven effective for zero-shot classification.
In this paper, we aim at a more challenging setting, Realistic Zero-Shot Classification, which assumes no annotation but instead a broad vocabulary.
We propose the Self Structural Semantic Alignment (S3A) framework, which extracts structural semantic information from unlabeled data while simultaneously self-learning.
arXiv Detail & Related papers (2023-08-24T17:56:46Z) - CLIP-GCD: Simple Language Guided Generalized Category Discovery [21.778676607030253]
Generalized Category Discovery (GCD) requires a model to both classify known categories and cluster unknown categories in unlabeled data.
Prior methods leveraged self-supervised pre-training combined with supervised fine-tuning on the labeled data, followed by simple clustering methods.
We propose to leverage multi-modal (vision and language) models, in two complementary ways.
arXiv Detail & Related papers (2023-05-17T17:55:33Z) - Joint Open Knowledge Base Canonicalization and Linking [24.160755953937763]
noun phrases (NPs) and relation phrases (RPs) in Open Knowledge Bases are not canonicalized.
We propose a novel framework JOCL based on factor graph model to make them reinforce each other.
arXiv Detail & Related papers (2022-12-02T14:38:58Z) - DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for
Open-world Detection [118.36746273425354]
This paper presents a paralleled visual-concept pre-training method for open-world detection by resorting to knowledge enrichment from a designed concept dictionary.
By enriching the concepts with their descriptions, we explicitly build the relationships among various concepts to facilitate the open-domain learning.
The proposed framework demonstrates strong zero-shot detection performances, e.g., on the LVIS dataset, our DetCLIP-T outperforms GLIP-T by 9.9% mAP and obtains a 13.5% improvement on rare categories.
arXiv Detail & Related papers (2022-09-20T02:01:01Z) - Joint Entity and Relation Canonicalization in Open Knowledge Graphs
using Variational Autoencoders [11.259587284318835]
Noun phrases and relation phrases in open knowledge graphs are not canonicalized, leading to an explosion of redundant and ambiguous subject-relation-object triples.
Existing approaches to face this problem take a two-step approach: first, they generate embedding representations for both noun and relation phrases, then a clustering algorithm is used to group them using the embeddings as features.
In this work, we propose Canonicalizing Using Variational AutoEncoders (CUVA), a joint model to learn both embeddings and cluster assignments in an end-to-end approach.
arXiv Detail & Related papers (2020-12-08T22:58:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.