Reducing the dimensionality and granularity in hierarchical categorical variables
- URL: http://arxiv.org/abs/2403.03613v2
- Date: Mon, 19 Aug 2024 13:29:10 GMT
- Title: Reducing the dimensionality and granularity in hierarchical categorical variables
- Authors: Paul Wilsens, Katrien Antonio, Gerda Claeskens,
- Abstract summary: We propose a methodology to obtain a reduced representation of a hierarchical categorical variable.
We show how entity embedding can be applied in a hierarchical setting.
We apply our methodology on a real dataset and find that the reduced hierarchy is an improvement over the original hierarchical structure.
- Score: 2.089191490381739
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Hierarchical categorical variables often exhibit many levels (high granularity) and many classes within each level (high dimensionality). This may cause overfitting and estimation issues when including such covariates in a predictive model. In current literature, a hierarchical covariate is often incorporated via nested random effects. However, this does not facilitate the assumption of classes having the same effect on the response variable. In this paper, we propose a methodology to obtain a reduced representation of a hierarchical categorical variable. We show how entity embedding can be applied in a hierarchical setting. Subsequently, we propose a top-down clustering algorithm which leverages the information encoded in the embeddings to reduce both the within-level dimensionality as well as the overall granularity of the hierarchical categorical variable. In simulation experiments, we show that our methodology can effectively approximate the true underlying structure of a hierarchical covariate in terms of the effect on a response variable, and find that incorporating the reduced hierarchy improves the balance between model fit and complexity. We apply our methodology on a real dataset and find that the reduced hierarchy is an improvement over the original hierarchical structure and reduced structures proposed in the literature.
Related papers
- Harnessing Superclasses for Learning from Hierarchical Databases [1.835004446596942]
In many large-scale classification problems, classes are organized in a known hierarchy, typically represented as a tree.
We introduce a loss for this type of supervised hierarchical classification.
Our approach does not entail any significant additional computational cost compared with the loss of cross-entropy.
arXiv Detail & Related papers (2024-11-25T14:39:52Z) - CAVIAR: Categorical-Variable Embeddings for Accurate and Robust Inference [0.2209921757303168]
Social science research often hinges on the relationship between categorical variables and outcomes.
We introduce CAVIAR, a novel method for embedding categorical variables that assume values in a high-dimensional ambient space but are sampled from an underlying manifold.
arXiv Detail & Related papers (2024-04-07T14:47:07Z) - Deep Imbalanced Regression via Hierarchical Classification Adjustment [50.19438850112964]
Regression tasks in computer vision are often formulated into classification by quantizing the target space into classes.
The majority of training samples lie in a head range of target values, while a minority of samples span a usually larger tail range.
We propose to construct hierarchical classifiers for solving imbalanced regression tasks.
Our novel hierarchical classification adjustment (HCA) for imbalanced regression shows superior results on three diverse tasks.
arXiv Detail & Related papers (2023-10-26T04:54:39Z) - Hierarchical clustering with dot products recovers hidden tree structure [53.68551192799585]
In this paper we offer a new perspective on the well established agglomerative clustering algorithm, focusing on recovery of hierarchical structure.
We recommend a simple variant of the standard algorithm, in which clusters are merged by maximum average dot product and not, for example, by minimum distance or within-cluster variance.
We demonstrate that the tree output by this algorithm provides a bona fide estimate of generative hierarchical structure in data, under a generic probabilistic graphical model.
arXiv Detail & Related papers (2023-05-24T11:05:12Z) - Use All The Labels: A Hierarchical Multi-Label Contrastive Learning
Framework [75.79736930414715]
We present a hierarchical multi-label representation learning framework that can leverage all available labels and preserve the hierarchical relationship between classes.
We introduce novel hierarchy preserving losses, which jointly apply a hierarchical penalty to the contrastive loss, and enforce the hierarchy constraint.
arXiv Detail & Related papers (2022-04-27T21:41:44Z) - Deep Hierarchy in Bandits [51.22833900944146]
Mean rewards of actions are often correlated.
To maximize statistical efficiency, it is important to leverage these correlations when learning.
We formulate a bandit variant of this problem where the correlations of mean action rewards are represented by a hierarchical Bayesian model.
arXiv Detail & Related papers (2022-02-03T08:15:53Z) - Label Hierarchy Transition: Delving into Class Hierarchies to Enhance
Deep Classifiers [40.993137740456014]
We propose a unified probabilistic framework based on deep learning to address the challenges of hierarchical classification.
The proposed framework can be readily adapted to any existing deep network with only minor modifications.
We extend our proposed LHT framework to the skin lesion diagnosis task and validate its great potential in computer-aided diagnosis.
arXiv Detail & Related papers (2021-12-04T14:58:36Z) - Inducing a hierarchy for multi-class classification problems [11.58041597483471]
In applications where categorical labels follow a natural hierarchy, classification methods that exploit the label structure often outperform those that do not.
In this paper, we investigate a class of methods that induce a hierarchy that can similarly improve classification performance over flat classifiers.
We demonstrate the effectiveness of the class of methods both for discovering a latent hierarchy and for improving accuracy in principled simulation settings and three real data applications.
arXiv Detail & Related papers (2021-02-20T05:40:42Z) - Pitfalls of Assessing Extracted Hierarchies for Multi-Class
Classification [4.89253144446913]
We identify some common pitfalls that may lead practitioners to make misleading conclusions about their methods.
We show how the hierarchy's quality can become irrelevant depending on the experimental setup.
Our results confirm that datasets with a high number of classes generally present complex structures in how these classes relate to each other.
arXiv Detail & Related papers (2021-01-26T21:50:57Z) - Exploring the Hierarchy in Relation Labels for Scene Graph Generation [75.88758055269948]
The proposed method can improve several state-of-the-art baselines by a large margin (up to $33%$ relative gain) in terms of Recall@50.
Experiments show that the proposed simple yet effective method can improve several state-of-the-art baselines by a large margin.
arXiv Detail & Related papers (2020-09-12T17:36:53Z) - Understanding and Diagnosing Vulnerability under Adversarial Attacks [62.661498155101654]
Deep Neural Networks (DNNs) are known to be vulnerable to adversarial attacks.
We propose a novel interpretability method, InterpretGAN, to generate explanations for features used for classification in latent variables.
We also design the first diagnostic method to quantify the vulnerability contributed by each layer.
arXiv Detail & Related papers (2020-07-17T01:56:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.