Related papers: HyperAid: Denoising in hyperbolic spaces for tree-fitting and hierarchical clustering

HyperAid: Denoising in hyperbolic spaces for tree-fitting and hierarchical clustering

URL: http://arxiv.org/abs/2205.09721v1
Date: Thu, 19 May 2022 17:33:16 GMT
Title: HyperAid: Denoising in hyperbolic spaces for tree-fitting and hierarchical clustering
Authors: Eli Chien, Puoya Tabaghi, Olgica Milenkovic
Abstract summary: We propose a new approach to treemetric denoising (HyperAid) in hyperbolic spaces. It transforms original data into data that is more'' tree-like, when evaluated in terms of Gromov's $delta$ hyperbolicity. We integrate HyperAid with schemes for enforcing nonnegative edge-weights.
Score: 36.738414547278154
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The problem of fitting distances by tree-metrics has received significant attention in the theoretical computer science and machine learning communities alike, due to many applications in natural language processing, phylogeny, cancer genomics and a myriad of problem areas that involve hierarchical clustering. Despite the existence of several provably exact algorithms for tree-metric fitting of data that inherently obeys tree-metric constraints, much less is known about how to best fit tree-metrics for data whose structure moderately (or substantially) differs from a tree. For such noisy data, most available algorithms perform poorly and often produce negative edge weights in representative trees. Furthermore, it is currently not known how to choose the most suitable approximation objective for noisy fitting. Our contributions are as follows. First, we propose a new approach to tree-metric denoising (HyperAid) in hyperbolic spaces which transforms the original data into data that is ``more'' tree-like, when evaluated in terms of Gromov's $\delta$ hyperbolicity. Second, we perform an ablation study involving two choices for the approximation objective, $\ell_p$ norms and the Dasgupta loss. Third, we integrate HyperAid with schemes for enforcing nonnegative edge-weights. As a result, the HyperAid platform outperforms all other existing methods in the literature, including Neighbor Joining (NJ), TreeRep and T-REX, both on synthetic and real-world data. Synthetic data is represented by edge-augmented trees and shortest-distance metrics while the real-world datasets include Zoo, Iris, Glass, Segmentation and SpamBase; on these datasets, the average improvement with respect to NJ is $125.94\%$.

Related papers

IsoSEL: Isometric Structural Entropy Learning for Deep Graph Clustering in Hyperbolic Space [57.036143666293334]
Graph clustering is a longstanding topic in machine learning. In this paper, we study a challenging yet practical problem: deep graph clustering without K considering the imbalance in reality. We present a novel IsoSEL framework for deep graph clustering, where we design a hyperbolic neural network to learn partitioning tree in the Lorentz model of hyperbolic space.
arXiv Detail & Related papers (2025-04-14T08:21:41Z)
Fast unsupervised ground metric learning with tree-Wasserstein distance [14.235762519615175]
unsupervised ground metric learning approaches have been introduced. We propose to augment the WSV method by embedding samples and features on trees, on which we compute the tree-Wasserstein distance (TWD) We demonstrate theoretically and empirically that the algorithm converges to a better approximation of the full WSV approach than the best known alternatives, and does so with $mathcalO(n3)$ complexity.
arXiv Detail & Related papers (2024-11-11T23:21:01Z)
Fitting trees to $\ell_1$-hyperbolic distances [4.220336689294244]
Building trees to represent or to fit distances is a critical component of phylogenetic analysis. We study the tree fitting problem as finding the relation between the hyperbolicity (ultrametricity) vector and the error of tree (ultrametric) embedding.
arXiv Detail & Related papers (2024-09-02T07:38:32Z)
Revisiting Evaluation Metrics for Semantic Segmentation: Optimization and Evaluation of Fine-grained Intersection over Union [113.20223082664681]
We propose the use of fine-grained mIoUs along with corresponding worst-case metrics. These fine-grained metrics offer less bias towards large objects, richer statistical information, and valuable insights into model and dataset auditing. Our benchmark study highlights the necessity of not basing evaluations on a single metric and confirms that fine-grained mIoUs reduce the bias towards large objects.
arXiv Detail & Related papers (2023-10-30T03:45:15Z)
Hyperbolic Random Forests [15.992363138277442]
We generalize the well-known random forests to hyperbolic space. We do this by redefining the notion of a split using horospheres. We also outline a new method for combining classes based on their lowest common ancestor and a class-balanced version of the large-margin loss.
arXiv Detail & Related papers (2023-08-25T10:01:53Z)
Tight and fast generalization error bound of graph embedding in metric space [54.279425319381374]
We show that graph embedding in non-Euclidean metric space can outperform that in Euclidean space with much smaller training data than the existing bound has suggested. Our new upper bound is significantly tighter and faster than the existing one, which can be exponential to $R$ and $O(frac1S)$ at the fastest.
arXiv Detail & Related papers (2023-05-13T17:29:18Z)
Optimal Sparse Regression Trees [24.03491277969824]
This work proposes a dynamic-programming-with-bounds approach to the construction of provably-optimal sparse regression trees. We leverage a novel lower bound based on an optimal solution to the k-Means clustering algorithm in 1-dimension over the set of labels.
arXiv Detail & Related papers (2022-11-28T00:43:21Z)
Active-LATHE: An Active Learning Algorithm for Boosting the Error Exponent for Learning Homogeneous Ising Trees [75.93186954061943]
We design and analyze an algorithm that boosts the error exponent by at least 40% when $rho$ is at least $0.8$. Our analysis hinges on judiciously exploiting the minute but detectable statistical variation of the samples to allocate more data to parts of the graph.
arXiv Detail & Related papers (2021-10-27T10:45:21Z)
Decentralized Learning of Tree-Structured Gaussian Graphical Models from Noisy Data [1.52292571922932]
This paper studies the decentralized learning of tree-structured Gaussian graphical models (GGMs) from noisy data. GGMs are widely used to model complex networks such as gene regulatory networks and social networks. The proposed decentralized learning uses the Chow-Liu algorithm for estimating the tree-structured GGM.
arXiv Detail & Related papers (2021-09-22T10:41:18Z)
Highly Scalable and Provably Accurate Classification in Poincare Balls [40.82908295137667]
We establish a unified framework for learning scalable and simple hyperbolic linear classifiers with provable performance guarantees. Our results include a new hyperbolic and second-order perceptron algorithm as well as an efficient and highly accurate convex optimization setup for hyperbolic support vector machine classifiers. Their performance accuracies on synthetic data sets comprising millions of points, as well as on complex real-world data sets such as single-cell RNA-seq expression measurements, CIFAR10, Fashion-MNIST and mini-ImageNet.
arXiv Detail & Related papers (2021-09-08T16:59:39Z)
Exact and Approximate Hierarchical Clustering Using A* [51.187990314731344]
We introduce a new approach based on A* search for clustering. We overcome the prohibitively large search space by combining A* with a novel emphtrellis data structure. We empirically demonstrate that our method achieves substantially higher quality results than baselines for a particle physics use case and other clustering benchmarks.
arXiv Detail & Related papers (2021-04-14T18:15:27Z)
Forest R-CNN: Large-Vocabulary Long-Tailed Object Detection and Instance Segmentation [75.93960390191262]
We exploit prior knowledge of the relations among object categories to cluster fine-grained classes into coarser parent classes. We propose a simple yet effective resampling method, NMS Resampling, to re-balance the data distribution. Our method, termed as Forest R-CNN, can serve as a plug-and-play module being applied to most object recognition models.
arXiv Detail & Related papers (2020-08-13T03:52:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.