eTREE: Learning Tree-structured Embeddings
- URL: http://arxiv.org/abs/2012.10853v1
- Date: Sun, 20 Dec 2020 06:06:08 GMT
- Title: eTREE: Learning Tree-structured Embeddings
- Authors: Faisal M. Almutairi, Yunlong Wang, Dong Wang, Emily Zhao, Nicholas D.
Sidiropoulos
- Abstract summary: Matrix factorization (MF) plays an important role in a wide range of machine learning and data mining models.
MF is commonly used to obtain item embeddings and feature representations.
We propose eTREE, a model that incorporates the tree structure to enhance the quality of the embeddings.
- Score: 33.61635854505735
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Matrix factorization (MF) plays an important role in a wide range of machine
learning and data mining models. MF is commonly used to obtain item embeddings
and feature representations due to its ability to capture correlations and
higher-order statistical dependencies across dimensions. In many applications,
the categories of items exhibit a hierarchical tree structure. For instance,
human diseases can be divided into coarse categories, e.g., bacterial, and
viral. These categories can be further divided into finer categories, e.g.,
viral infections can be respiratory, gastrointestinal, and exanthematous viral
diseases. In e-commerce, products, movies, books, etc., are grouped into
hierarchical categories, e.g., clothing items are divided by gender, then by
type (formal, casual, etc.). While the tree structure and the categories of the
different items may be known in some applications, they have to be learned
together with the embeddings in many others. In this work, we propose eTREE, a
model that incorporates the (usually ignored) tree structure to enhance the
quality of the embeddings. We leverage the special uniqueness properties of
Nonnegative MF (NMF) to prove identifiability of eTREE. The proposed model not
only exploits the tree structure prior, but also learns the hierarchical
clustering in an unsupervised data-driven fashion. We derive an efficient
algorithmic solution and a scalable implementation of eTREE that exploits
parallel computing, computation caching, and warm start strategies. We showcase
the effectiveness of eTREE on real data from various application domains:
healthcare, recommender systems, and education. We also demonstrate the
meaningfulness of the tree obtained from eTREE by means of domain experts
interpretation.
Related papers
- Beyond TreeSHAP: Efficient Computation of Any-Order Shapley Interactions
for Tree Ensembles [6.664930499708017]
The Shapley value (SV) is a concept in explainable artificial intelligence (XAI) research for quantifying additive feature attributions of predictions.
We present TreeSHAP-IQ, an efficient method to compute any-order additive Shapley interactions for predictions tree-based models.
arXiv Detail & Related papers (2024-01-22T16:08:41Z) - Effective and Efficient Federated Tree Learning on Hybrid Data [80.31870543351918]
We propose HybridTree, a novel federated learning approach that enables federated tree learning on hybrid data.
We observe the existence of consistent split rules in trees and show that the knowledge of parties can be incorporated into the lower layers of a tree.
Our experiments demonstrate that HybridTree can achieve comparable accuracy to the centralized setting with low computational and communication overhead.
arXiv Detail & Related papers (2023-10-18T10:28:29Z) - Hierarchical clustering with dot products recovers hidden tree structure [53.68551192799585]
In this paper we offer a new perspective on the well established agglomerative clustering algorithm, focusing on recovery of hierarchical structure.
We recommend a simple variant of the standard algorithm, in which clusters are merged by maximum average dot product and not, for example, by minimum distance or within-cluster variance.
We demonstrate that the tree output by this algorithm provides a bona fide estimate of generative hierarchical structure in data, under a generic probabilistic graphical model.
arXiv Detail & Related papers (2023-05-24T11:05:12Z) - Flexible Modeling and Multitask Learning using Differentiable Tree
Ensembles [6.037383467521294]
We propose a flexible framework for learning tree ensembles to support arbitrary loss functions, missing responses, and multi-task learning.
Our framework builds on differentiable tree ensembles, which can be trained using first-order methods.
We show that our framework can lead to 100x more compact and 23% more expressive tree ensembles than those by popular toolkits.
arXiv Detail & Related papers (2022-05-19T17:30:49Z) - Learning Latent and Hierarchical Structures in Cognitive Diagnosis
Models [3.4646560112467037]
A key component of Cognitive Diagnosis Models (CDMs) is a binary $Q$-matrix characterizing the dependence structure between the items and the latent attributes.
This paper considers the problem of jointly learning these latent and hierarchical structures in CDMs from observed data.
An efficient expectation-maximization algorithm and a latent structure recovery algorithm are developed.
arXiv Detail & Related papers (2021-04-05T20:33:02Z) - Exemplars can Reciprocate Principal Components [0.0]
Category Trees is a clustering method that creates tree structures that branch on category type and not feature.
The theory is demonstrated using the Portugal Forest Fires dataset as a case study.
arXiv Detail & Related papers (2021-03-22T12:46:29Z) - Visualizing hierarchies in scRNA-seq data using a density tree-biased
autoencoder [50.591267188664666]
We propose an approach for identifying a meaningful tree structure from high-dimensional scRNA-seq data.
We then introduce DTAE, a tree-biased autoencoder that emphasizes the tree structure of the data in low dimensional space.
arXiv Detail & Related papers (2021-02-11T08:48:48Z) - Hierarchical Graph Capsule Network [78.4325268572233]
We propose hierarchical graph capsule network (HGCN) that can jointly learn node embeddings and extract graph hierarchies.
To learn the hierarchical representation, HGCN characterizes the part-whole relationship between lower-level capsules (part) and higher-level capsules (whole)
arXiv Detail & Related papers (2020-12-16T04:13:26Z) - Forest R-CNN: Large-Vocabulary Long-Tailed Object Detection and Instance
Segmentation [75.93960390191262]
We exploit prior knowledge of the relations among object categories to cluster fine-grained classes into coarser parent classes.
We propose a simple yet effective resampling method, NMS Resampling, to re-balance the data distribution.
Our method, termed as Forest R-CNN, can serve as a plug-and-play module being applied to most object recognition models.
arXiv Detail & Related papers (2020-08-13T03:52:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.