Guiding Corpus-based Set Expansion by Auxiliary Sets Generation and
Co-Expansion
- URL: http://arxiv.org/abs/2001.10106v1
- Date: Mon, 27 Jan 2020 22:34:07 GMT
- Title: Guiding Corpus-based Set Expansion by Auxiliary Sets Generation and
Co-Expansion
- Authors: Jiaxin Huang, Yiqing Xie, Yu Meng, Jiaming Shen, Yunyi Zhang and
Jiawei Han
- Abstract summary: corpus-based set expansion algorithms bootstrap the given seeds by incorporating lexical patterns and distributional similarity.
Set-CoExpan automatically generates auxiliary sets as negative sets that are closely related to the target set of user's interest.
We show that Set-CoExpan outperforms strong baseline methods significantly.
- Score: 45.716171458483636
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Given a small set of seed entities (e.g., ``USA'', ``Russia''), corpus-based
set expansion is to induce an extensive set of entities which share the same
semantic class (Country in this example) from a given corpus. Set expansion
benefits a wide range of downstream applications in knowledge discovery, such
as web search, taxonomy construction, and query suggestion. Existing
corpus-based set expansion algorithms typically bootstrap the given seeds by
incorporating lexical patterns and distributional similarity. However, due to
no negative sets provided explicitly, these methods suffer from semantic drift
caused by expanding the seed set freely without guidance. We propose a new
framework, Set-CoExpan, that automatically generates auxiliary sets as negative
sets that are closely related to the target set of user's interest, and then
performs multiple sets co-expansion that extracts discriminative features by
comparing target set with auxiliary sets, to form multiple cohesive sets that
are distinctive from one another, thus resolving the semantic drift issue. In
this paper we demonstrate that by generating auxiliary sets, we can guide the
expansion process of target set to avoid touching those ambiguous areas around
the border with auxiliary sets, and we show that Set-CoExpan outperforms strong
baseline methods significantly.
Related papers
- FUSE: Measure-Theoretic Compact Fuzzy Set Representation for Taxonomy Expansion [36.714348668366]
We propose a sound and efficient formulation of set representation learning based on its volume approximation as a fuzzy set.<n>The resulting embedding framework, Fuzzy Set Embedding (FUSE), satisfies all set operations and compactly approximates the underlying fuzzy set.
arXiv Detail & Related papers (2025-06-10T03:28:32Z) - A Universal Sets-level Optimization Framework for Next Set Recommendation [15.808908615022709]
Next Set Recommendation (NSRec) stands as a trending research topic.
We unveil a universal and S ets-level optimization framework for N ext Set Recommendation (SNSRec)
Our approach consistently outperforms previous methods on both relevance and diversity.
arXiv Detail & Related papers (2024-10-30T13:53:46Z) - SetCSE: Set Operations using Contrastive Learning of Sentence Embeddings [6.988934943372354]
SetCSE employs sets to represent complex semantics and incorporates well-defined operations for structured information querying.
We introduce an inter-set contrastive learning objective to enhance comprehension of sentence embedding models concerning the given semantics.
We demonstrate that SetCSE adheres to the conventions of human language expressions regarding compounded semantics.
arXiv Detail & Related papers (2024-04-25T02:05:30Z) - Enhancing Neural Subset Selection: Integrating Background Information into Set Representations [53.15923939406772]
We show that when the target value is conditioned on both the input set and subset, it is essential to incorporate an textitinvariant sufficient statistic of the superset into the subset of interest.
This ensures that the output value remains invariant to permutations of the subset and its corresponding superset, enabling identification of the specific superset from which the subset originated.
arXiv Detail & Related papers (2024-02-05T16:09:35Z) - Foundational theories of hesitant fuzzy sets and hesitant fuzzy
information systems and their applications for multi-strength intelligent
classifiers [32.78664473821173]
Hesitant fuzzy sets are widely used in certain instances of uncertainty and hesitation.
As a kind of set, hesitant fuzzy sets require an explicit definition of inclusion relationship.
arXiv Detail & Related papers (2023-11-07T14:03:28Z) - Redundancy-aware unsupervised rankings for collections of gene sets [0.28675177318965034]
We propose to use importance scores to rank the pathways in the collections studying the context from a set covering perspective.
The proposed method shows a practical utility in bioinformatics to increase the interpretability of the collections of gene sets.
arXiv Detail & Related papers (2023-07-30T09:39:42Z) - GausSetExpander: A Simple Approach for Entity Set Expansion [0.0]
We propose GausSetExpander, an unsupervised approach based on optimal transport techniques.
We demonstrate the validity of our approach by comparing to state-of-the art approaches.
arXiv Detail & Related papers (2022-02-28T09:44:43Z) - Exploring Set Similarity for Dense Self-supervised Representation
Learning [96.35286140203407]
We propose to explore textbfset textbfsimilarity (SetSim) for dense self-supervised representation learning.
We generalize pixel-wise similarity learning to set-wise one to improve the robustness because sets contain more semantic and structure information.
Specifically, by resorting to attentional features of views, we establish corresponding sets, thus filtering out noisy backgrounds that may cause incorrect correspondences.
arXiv Detail & Related papers (2021-07-19T09:38:27Z) - Mini-Batch Consistent Slot Set Encoder for Scalable Set Encoding [50.61114177411961]
We introduce a new property termed Mini-Batch Consistency that is required for large scale mini-batch set encoding.
We present a scalable and efficient set encoding mechanism that is amenable to mini-batch processing with respect to set elements and capable of updating set representations as more data arrives.
arXiv Detail & Related papers (2021-03-02T10:10:41Z) - Empower Entity Set Expansion via Language Model Probing [58.78909391545238]
Existing set expansion methods bootstrap the seed entity set by adaptively selecting context features and extracting new entities.
A key challenge for entity set expansion is to avoid selecting ambiguous context features which will shift the class semantics and lead to accumulative errors in later iterations.
We propose a novel iterative set expansion framework that leverages automatically generated class names to address the semantic drift issue.
arXiv Detail & Related papers (2020-04-29T00:09:43Z) - Learn to Predict Sets Using Feed-Forward Neural Networks [63.91494644881925]
This paper addresses the task of set prediction using deep feed-forward neural networks.
We present a novel approach for learning to predict sets with unknown permutation and cardinality.
We demonstrate the validity of our set formulations on relevant vision problems.
arXiv Detail & Related papers (2020-01-30T01:52:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.