Hierarchical clustering: visualization, feature importance and model
selection
- URL: http://arxiv.org/abs/2112.01372v1
- Date: Tue, 30 Nov 2021 20:38:17 GMT
- Title: Hierarchical clustering: visualization, feature importance and model
selection
- Authors: Luben M. C. Cabezas, Rafael Izbicki, Rafael B. Stern
- Abstract summary: We propose methods for the analysis of hierarchical clustering that fully use the multi-resolution structure provided by a dendrogram.
The key insight behind the proposed methods is to view a dendrogram as a phylogeny.
Real and simulated datasets provide evidence that our proposed framework has desirable outcomes.
- Score: 4.017760528208122
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose methods for the analysis of hierarchical clustering that fully use
the multi-resolution structure provided by a dendrogram. Specifically, we
propose a loss for choosing between clustering methods, a feature importance
score and a graphical tool for visualizing the segmentation of features in a
dendrogram. Current approaches to these tasks lead to loss of information since
they require the user to generate a single partition of the instances by
cutting the dendrogram at a specified level. Our proposed methods, instead, use
the full structure of the dendrogram. The key insight behind the proposed
methods is to view a dendrogram as a phylogeny. This analogy permits the
assignment of a feature value to each internal node of a tree through ancestral
state reconstruction. Real and simulated datasets provide evidence that our
proposed framework has desirable outcomes. We provide an R package that
implements our methods.
Related papers
- Fast and Scalable Semi-Supervised Learning for Multi-View Subspace Clustering [13.638434337947302]
FSSMSC is a novel solution to the high computational complexity commonly found in existing approaches.
The method generates a consensus anchor graph across all views, representing each data point as a sparse linear combination of chosen landmarks.
The effectiveness and efficiency of FSSMSC are validated through extensive experiments on multiple benchmark datasets of varying scales.
arXiv Detail & Related papers (2024-08-11T06:54:00Z) - Deciphering 'What' and 'Where' Visual Pathways from Spectral Clustering of Layer-Distributed Neural Representations [15.59251297818324]
We present an approach for analyzing grouping information contained within a neural network's activations.
We exploit features from all layers and obviating the need to guess which part of the model contains relevant information.
arXiv Detail & Related papers (2023-12-11T01:20:34Z) - Hierarchical clustering with dot products recovers hidden tree structure [53.68551192799585]
In this paper we offer a new perspective on the well established agglomerative clustering algorithm, focusing on recovery of hierarchical structure.
We recommend a simple variant of the standard algorithm, in which clusters are merged by maximum average dot product and not, for example, by minimum distance or within-cluster variance.
We demonstrate that the tree output by this algorithm provides a bona fide estimate of generative hierarchical structure in data, under a generic probabilistic graphical model.
arXiv Detail & Related papers (2023-05-24T11:05:12Z) - GrannGAN: Graph annotation generative adversarial networks [72.66289932625742]
We consider the problem of modelling high-dimensional distributions and generating new examples of data with complex relational feature structure coherent with a graph skeleton.
The model we propose tackles the problem of generating the data features constrained by the specific graph structure of each data point by splitting the task into two phases.
In the first it models the distribution of features associated with the nodes of the given graph, in the second it complements the edge features conditionally on the node features.
arXiv Detail & Related papers (2022-12-01T11:49:07Z) - SHGNN: Structure-Aware Heterogeneous Graph Neural Network [77.78459918119536]
This paper proposes a novel Structure-Aware Heterogeneous Graph Neural Network (SHGNN) to address the above limitations.
We first utilize a feature propagation module to capture the local structure information of intermediate nodes in the meta-path.
Next, we use a tree-attention aggregator to incorporate the graph structure information into the aggregation module on the meta-path.
Finally, we leverage a meta-path aggregator to fuse the information aggregated from different meta-paths.
arXiv Detail & Related papers (2021-12-12T14:18:18Z) - Effective and Efficient Graph Learning for Multi-view Clustering [173.8313827799077]
We propose an effective and efficient graph learning model for multi-view clustering.
Our method exploits the view-similar between graphs of different views by the minimization of tensor Schatten p-norm.
Our proposed algorithm is time-economical and obtains the stable results and scales well with the data size.
arXiv Detail & Related papers (2021-08-15T13:14:28Z) - Towards Clustering-friendly Representations: Subspace Clustering via
Graph Filtering [16.60975509085194]
We propose a graph filtering approach by which a smooth representation is achieved.
Experiments on image and document clustering datasets demonstrate that our method improves upon state-of-the-art subspace clustering techniques.
An ablation study shows that graph filtering can remove noise, preserve structure in the image, and increase the separability of classes.
arXiv Detail & Related papers (2021-06-18T02:21:36Z) - Structured Graph Learning for Clustering and Semi-supervised
Classification [74.35376212789132]
We propose a graph learning framework to preserve both the local and global structure of data.
Our method uses the self-expressiveness of samples to capture the global structure and adaptive neighbor approach to respect the local structure.
Our model is equivalent to a combination of kernel k-means and k-means methods under certain condition.
arXiv Detail & Related papers (2020-08-31T08:41:20Z) - Graph Neural Networks with Composite Kernels [60.81504431653264]
We re-interpret node aggregation from the perspective of kernel weighting.
We present a framework to consider feature similarity in an aggregation scheme.
We propose feature aggregation as the composition of the original neighbor-based kernel and a learnable kernel to encode feature similarities in a feature space.
arXiv Detail & Related papers (2020-05-16T04:44:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.