From Trees to Continuous Embeddings and Back: Hyperbolic Hierarchical
Clustering
- URL: http://arxiv.org/abs/2010.00402v1
- Date: Thu, 1 Oct 2020 13:43:19 GMT
- Title: From Trees to Continuous Embeddings and Back: Hyperbolic Hierarchical
Clustering
- Authors: Ines Chami, Albert Gu, Vaggos Chatziafratis and Christopher R\'e
- Abstract summary: We present the first continuous relaxation of Dasgupta's discrete optimization problem with provable quality guarantees.
We show that even approximate solutions found with gradient descent have superior quality than agglomerative clusterings.
We also highlight the flexibility of HypHC using end-to-end training in a downstream classification task.
- Score: 33.000371053304676
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Similarity-based Hierarchical Clustering (HC) is a classical unsupervised
machine learning algorithm that has traditionally been solved with heuristic
algorithms like Average-Linkage. Recently, Dasgupta reframed HC as a discrete
optimization problem by introducing a global cost function measuring the
quality of a given tree. In this work, we provide the first continuous
relaxation of Dasgupta's discrete optimization problem with provable quality
guarantees. The key idea of our method, HypHC, is showing a direct
correspondence from discrete trees to continuous representations (via the
hyperbolic embeddings of their leaf nodes) and back (via a decoding algorithm
that maps leaf embeddings to a dendrogram), allowing us to search the space of
discrete binary trees with continuous optimization. Building on analogies
between trees and hyperbolic space, we derive a continuous analogue for the
notion of lowest common ancestor, which leads to a continuous relaxation of
Dasgupta's discrete objective. We can show that after decoding, the global
minimizer of our continuous relaxation yields a discrete tree with a (1 +
epsilon)-factor approximation for Dasgupta's optimal tree, where epsilon can be
made arbitrarily small and controls optimization challenges. We experimentally
evaluate HypHC on a variety of HC benchmarks and find that even approximate
solutions found with gradient descent have superior clustering quality than
agglomerative heuristics or other gradient based algorithms. Finally, we
highlight the flexibility of HypHC using end-to-end training in a downstream
classification task.
Related papers
- Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity [59.75300530380427]
We consider the problem of optimizing second-order smooth and strongly convex functions where the algorithm is only accessible to noisy evaluations of the objective function it queries.
We provide the first tight characterization for the rate of the minimax simple regret by developing matching upper and lower bounds.
arXiv Detail & Related papers (2024-06-28T02:56:22Z) - Optimal estimation of Gaussian (poly)trees [25.02920605955238]
We consider both problems of distribution learning (i.e. in KL distance) and structure learning (i.e. exact recovery)
The first approach is based on the Chow-Liu algorithm, and learns an optimal tree-structured distribution efficiently.
The second approach is a modification of the PC algorithm for polytrees that uses partial correlation as a conditional independence tester for constraint-based structure learning.
arXiv Detail & Related papers (2024-02-09T12:58:36Z) - Sample-and-Bound for Non-Convex Optimization [18.30858789210194]
We propose new sampling methods for non-dimensional objective optimization that adapts Monte Carlo benchmarks to improve efficiency.
We evaluate the proposed high-order baseline and competitive benchmarks algorithms aggressively.
arXiv Detail & Related papers (2024-01-09T20:45:47Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - EM's Convergence in Gaussian Latent Tree Models [22.987933817370305]
We show that the unique non-trivial point of the population log-likelihood is its global maximum.
We establish that the expectation-maximization algorithm is guaranteed to converge to it in the single latent variable case.
arXiv Detail & Related papers (2022-11-21T23:12:58Z) - Discrete Tree Flows via Tree-Structured Permutations [5.929956715430168]
discrete flow-based models cannot be straightforwardly optimized with conventional deep learning methods because gradients of discrete functions are undefined or zero.
Our approach seeks to reduce computational burden and remove the need for pseudo-gradients by developing a discrete flow based on decision trees.
arXiv Detail & Related papers (2022-07-04T23:11:04Z) - Tree ensemble kernels for Bayesian optimization with known constraints
over mixed-feature spaces [54.58348769621782]
Tree ensembles can be well-suited for black-box optimization tasks such as algorithm tuning and neural architecture search.
Two well-known challenges in using tree ensembles for black-box optimization are (i) effectively quantifying model uncertainty for exploration and (ii) optimizing over the piece-wise constant acquisition function.
Our framework performs as well as state-of-the-art methods for unconstrained black-box optimization over continuous/discrete features and outperforms competing methods for problems combining mixed-variable feature spaces and known input constraints.
arXiv Detail & Related papers (2022-07-02T16:59:37Z) - CITS: Coherent Ising Tree Search Algorithm Towards Solving Combinatorial
Optimization Problems [0.0]
This paper proposes a search algorithm by expanding search space from a Markov chain to a depth limited tree based on SA.
At each iteration, the algorithm will select the best near-optimal solution within the feasible search space by exploring along the tree in the sense of look ahead'
Our results show that above the primals SA and CIM, our high-level tree search strategy is able to provide solutions within fewer epochs for Ising formulated NP-optimization problems.
arXiv Detail & Related papers (2022-03-09T10:07:26Z) - Gradient Boosted Binary Histogram Ensemble for Large-scale Regression [60.16351608335641]
We propose a gradient boosting algorithm for large-scale regression problems called textitGradient Boosted Binary Histogram Ensemble (GBBHE) based on binary histogram partition and ensemble learning.
In the experiments, compared with other state-of-the-art algorithms such as gradient boosted regression tree (GBRT), our GBBHE algorithm shows promising performance with less running time on large-scale datasets.
arXiv Detail & Related papers (2021-06-03T17:05:40Z) - Improved Branch and Bound for Neural Network Verification via Lagrangian
Decomposition [161.09660864941603]
We improve the scalability of Branch and Bound (BaB) algorithms for formally proving input-output properties of neural networks.
We present a novel activation-based branching strategy and a BaB framework, named Branch and Dual Network Bound (BaDNB)
BaDNB outperforms previous complete verification systems by a large margin, cutting average verification times by factors up to 50 on adversarial properties.
arXiv Detail & Related papers (2021-04-14T09:22:42Z) - MurTree: Optimal Classification Trees via Dynamic Programming and Search [61.817059565926336]
We present a novel algorithm for learning optimal classification trees based on dynamic programming and search.
Our approach uses only a fraction of the time required by the state-of-the-art and can handle datasets with tens of thousands of instances.
arXiv Detail & Related papers (2020-07-24T17:06:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.