Unsupervised Heterogeneous Coupling Learning for Categorical
Representation
- URL: http://arxiv.org/abs/2007.10720v1
- Date: Tue, 21 Jul 2020 11:23:27 GMT
- Title: Unsupervised Heterogeneous Coupling Learning for Categorical
Representation
- Authors: Chengzhang Zhu, Longbing Cao, and Jianping Yin
- Abstract summary: This work introduces a UNsupervised heTerogeneous couplIng lEarning (UNTIE) approach for representing coupled categorical data by untying the interactions between couplings.
UNTIE is efficiently optimized w.r.t. a kernel k-means objective function for unsupervised representation learning of heterogeneous and hierarchical value-to-object couplings.
The UNTIE-learned representations make significant performance improvement against the state-of-the-art categorical representations and deep representation models.
- Score: 50.1603042640492
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Complex categorical data is often hierarchically coupled with heterogeneous
relationships between attributes and attribute values and the couplings between
objects. Such value-to-object couplings are heterogeneous with complementary
and inconsistent interactions and distributions. Limited research exists on
unlabeled categorical data representations, ignores the heterogeneous and
hierarchical couplings, underestimates data characteristics and complexities,
and overuses redundant information, etc. The deep representation learning of
unlabeled categorical data is challenging, overseeing such value-to-object
couplings, complementarity and inconsistency, and requiring large data,
disentanglement, and high computational power. This work introduces a shallow
but powerful UNsupervised heTerogeneous couplIng lEarning (UNTIE) approach for
representing coupled categorical data by untying the interactions between
couplings and revealing heterogeneous distributions embedded in each type of
couplings. UNTIE is efficiently optimized w.r.t. a kernel k-means objective
function for unsupervised representation learning of heterogeneous and
hierarchical value-to-object couplings. Theoretical analysis shows that UNTIE
can represent categorical data with maximal separability while effectively
represent heterogeneous couplings and disclose their roles in categorical data.
The UNTIE-learned representations make significant performance improvement
against the state-of-the-art categorical representations and deep
representation models on 25 categorical data sets with diversified
characteristics.
Related papers
- Enhancing Missing Data Imputation through Combined Bipartite Graph and Complete Directed Graph [18.06658040186476]
We introduce a novel framework named the Bipartite and Complete Directed Graph Neural Network (BCGNN)
Within BCGNN, observations and features are differentiated as two distinct node types, and the values of observed features are converted into attributed edges linking them.
In parallel, the complete directed graph segment adeptly outlines and communicates the complex interdependencies among features.
arXiv Detail & Related papers (2024-11-07T17:48:37Z) - Generalizable Heterogeneous Federated Cross-Correlation and Instance
Similarity Learning [60.058083574671834]
This paper presents a novel FCCL+, federated correlation and similarity learning with non-target distillation.
For heterogeneous issue, we leverage irrelevant unlabeled public data for communication.
For catastrophic forgetting in local updating stage, FCCL+ introduces Federated Non Target Distillation.
arXiv Detail & Related papers (2023-09-28T09:32:27Z) - Jointprop: Joint Semi-supervised Learning for Entity and Relation
Extraction with Heterogeneous Graph-based Propagation [13.418617500641401]
We propose Jointprop, a Heterogeneous Graph-based Propagation framework for joint semi-supervised entity and relation extraction.
We construct a unified span-based heterogeneous graph from entity and relation candidates and propagate class labels based on confidence scores.
We show that our framework outperforms the state-of-the-art semi-supervised approaches on NER and RE tasks.
arXiv Detail & Related papers (2023-05-25T09:07:04Z) - HiURE: Hierarchical Exemplar Contrastive Learning for Unsupervised
Relation Extraction [60.80849503639896]
Unsupervised relation extraction aims to extract the relationship between entities from natural language sentences without prior information on relational scope or distribution.
We propose a novel contrastive learning framework named HiURE, which has the capability to derive hierarchical signals from relational feature space using cross hierarchy attention.
Experimental results on two public datasets demonstrate the advanced effectiveness and robustness of HiURE on unsupervised relation extraction when compared with state-of-the-art models.
arXiv Detail & Related papers (2022-05-04T17:56:48Z) - Scalable Regularised Joint Mixture Models [2.0686407686198263]
In many applications, data can be heterogeneous in the sense of spanning latent groups with different underlying distributions.
We propose an approach for heterogeneous data that allows joint learning of (i) explicit multivariate feature distributions, (ii) high-dimensional regression models and (iii) latent group labels.
The approach is demonstrably effective in high dimensions, combining data reduction for computational efficiency with a re-weighting scheme that retains key signals even when the number of features is large.
arXiv Detail & Related papers (2022-05-03T13:38:58Z) - Multimodal Data Fusion in High-Dimensional Heterogeneous Datasets via
Generative Models [16.436293069942312]
We are interested in learning probabilistic generative models from high-dimensional heterogeneous data in an unsupervised fashion.
We propose a general framework that combines disparate data types through the exponential family of distributions.
The proposed algorithm is presented in detail for the commonly encountered heterogeneous datasets with real-valued (Gaussian) and categorical (multinomial) features.
arXiv Detail & Related papers (2021-08-27T18:10:31Z) - Cross-Supervised Joint-Event-Extraction with Heterogeneous Information
Networks [61.950353376870154]
Joint-event-extraction is a sequence-to-sequence labeling task with a tag set composed of tags of triggers and entities.
We propose a Cross-Supervised Mechanism (CSM) to alternately supervise the extraction of triggers or entities.
Our approach outperforms the state-of-the-art methods in both entity and trigger extraction.
arXiv Detail & Related papers (2020-10-13T11:51:17Z) - Layer-stacked Attention for Heterogeneous Network Embedding [0.0]
Layer-stacked ATTention Embedding (LATTE) is an architecture that automatically decomposes higher-order meta relations at each layer.
LATTE offers a more interpretable aggregation scheme for nodes of different types at different neighborhood ranges.
In both transductive and inductive node classification tasks, LATTE can achieve state-of-the-art performance compared to existing approaches.
arXiv Detail & Related papers (2020-09-17T05:13:41Z) - Out-of-distribution Generalization via Partial Feature Decorrelation [72.96261704851683]
We present a novel Partial Feature Decorrelation Learning (PFDL) algorithm, which jointly optimize a feature decomposition network and the target image classification model.
The experiments on real-world datasets demonstrate that our method can improve the backbone model's accuracy on OOD image classification datasets.
arXiv Detail & Related papers (2020-07-30T05:48:48Z) - New advances in enumerative biclustering algorithms with online
partitioning [80.22629846165306]
This paper further extends RIn-Close_CVC, a biclustering algorithm capable of performing an efficient, complete, correct and non-redundant enumeration of maximal biclusters with constant values on columns in numerical datasets.
The improved algorithm is called RIn-Close_CVC3, keeps those attractive properties of RIn-Close_CVC, and is characterized by: a drastic reduction in memory usage; a consistent gain in runtime.
arXiv Detail & Related papers (2020-03-07T14:54:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.