Unsupervised Embedding of Hierarchical Structure in Euclidean Space
- URL: http://arxiv.org/abs/2010.16055v1
- Date: Fri, 30 Oct 2020 03:57:09 GMT
- Title: Unsupervised Embedding of Hierarchical Structure in Euclidean Space
- Authors: Jinyu Zhao, Yi Hao, Cyrus Rashtchian
- Abstract summary: We consider learning a non-linear embedding of data into Euclidean space as a way to improve the hierarchical clustering produced by agglomerative algorithms.
We show that rescaling the latent space embedding leads to improved results for both dendrogram purity and the Moseley-Wang cost function.
- Score: 30.507049058838025
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep embedding methods have influenced many areas of unsupervised learning.
However, the best methods for learning hierarchical structure use non-Euclidean
representations, whereas Euclidean geometry underlies the theory behind many
hierarchical clustering algorithms. To bridge the gap between these two areas,
we consider learning a non-linear embedding of data into Euclidean space as a
way to improve the hierarchical clustering produced by agglomerative
algorithms. To learn the embedding, we revisit using a variational autoencoder
with a Gaussian mixture prior, and we show that rescaling the latent space
embedding and then applying Ward's linkage-based algorithm leads to improved
results for both dendrogram purity and the Moseley-Wang cost function. Finally,
we complement our empirical results with a theoretical explanation of the
success of this approach. We study a synthetic model of the embedded vectors
and prove that Ward's method exactly recovers the planted hierarchical
clustering with high probability.
Related papers
- Anti-Collapse Loss for Deep Metric Learning Based on Coding Rate Metric [99.19559537966538]
DML aims to learn a discriminative high-dimensional embedding space for downstream tasks like classification, clustering, and retrieval.
To maintain the structure of embedding space and avoid feature collapse, we propose a novel loss function called Anti-Collapse Loss.
Comprehensive experiments on benchmark datasets demonstrate that our proposed method outperforms existing state-of-the-art methods.
arXiv Detail & Related papers (2024-07-03T13:44:20Z) - Unfolding ADMM for Enhanced Subspace Clustering of Hyperspectral Images [43.152314090830174]
We introduce an innovative clustering architecture for hyperspectral images (HSI) by unfolding an iterative solver based on the Alternating Direction Method of Multipliers (ADMM) for sparse subspace clustering.
Our approach captures well the structural characteristics of HSI data by employing the K nearest neighbors algorithm as part of a structure preservation module.
arXiv Detail & Related papers (2024-04-10T15:51:46Z) - Exploiting Data Hierarchy as a New Modality for Contrastive Learning [0.0]
This work investigates how hierarchically structured data can help neural networks learn conceptual representations of cathedrals.
The underlying WikiScenes dataset provides a spatially organized hierarchical structure of cathedral components.
We propose a novel hierarchical contrastive training approach that leverages a triplet margin loss to represent the data's spatial hierarchy in the encoder's latent space.
arXiv Detail & Related papers (2024-01-06T21:47:49Z) - Stochastic Unrolled Federated Learning [85.6993263983062]
We introduce UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning.
Our proposed method tackles two challenges of this expansion, namely the need to feed whole datasets to the unrolleds and the decentralized nature of federated learning.
arXiv Detail & Related papers (2023-05-24T17:26:22Z) - Hierarchical clustering with dot products recovers hidden tree structure [53.68551192799585]
In this paper we offer a new perspective on the well established agglomerative clustering algorithm, focusing on recovery of hierarchical structure.
We recommend a simple variant of the standard algorithm, in which clusters are merged by maximum average dot product and not, for example, by minimum distance or within-cluster variance.
We demonstrate that the tree output by this algorithm provides a bona fide estimate of generative hierarchical structure in data, under a generic probabilistic graphical model.
arXiv Detail & Related papers (2023-05-24T11:05:12Z) - Optimization-Based Separations for Neural Networks [57.875347246373956]
We show that gradient descent can efficiently learn ball indicator functions using a depth 2 neural network with two layers of sigmoidal activations.
This is the first optimization-based separation result where the approximation benefits of the stronger architecture provably manifest in practice.
arXiv Detail & Related papers (2021-12-04T18:07:47Z) - Non-Euclidean Self-Organizing Maps [0.0]
We present the generalized setup for non-Euclidean SOMs.
We improve the traditional SOM algorithm by introducing topology-related extensions.
Our proposition can be successfully applied to dimension reduction, clustering or finding similarities in big data.
arXiv Detail & Related papers (2021-09-24T06:57:15Z) - Functorial Manifold Learning [1.14219428942199]
We first characterize manifold learning algorithms as functors that map pseudometric spaces to optimization objectives.
We then use this characterization to prove refinement bounds on manifold learning loss functions and construct a hierarchy of manifold learning algorithms.
We express several popular manifold learning algorithms as functors at different levels of this hierarchy, including Metric Multidimensional Scaling, IsoMap, and UMAP.
arXiv Detail & Related papers (2020-11-15T02:30:23Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Dual-constrained Deep Semi-Supervised Coupled Factorization Network with
Enriched Prior [80.5637175255349]
We propose a new enriched prior based Dual-constrained Deep Semi-Supervised Coupled Factorization Network, called DS2CF-Net.
To ex-tract hidden deep features, DS2CF-Net is modeled as a deep-structure and geometrical structure-constrained neural network.
Our network can obtain state-of-the-art performance for representation learning and clustering.
arXiv Detail & Related papers (2020-09-08T13:10:21Z) - Hierarchical Correlation Clustering and Tree Preserving Embedding [3.821323038670061]
We propose a hierarchical correlation clustering method that extends the well-known correlation clustering.
In the following, we study unsupervised representation learning with such hierarchical correlation clustering.
arXiv Detail & Related papers (2020-02-18T17:44:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.