Tree! I am no Tree! I am a Low Dimensional Hyperbolic Embedding
- URL: http://arxiv.org/abs/2005.03847v4
- Date: Fri, 23 Oct 2020 01:37:28 GMT
- Title: Tree! I am no Tree! I am a Low Dimensional Hyperbolic Embedding
- Authors: Rishi Sonthalia, Anna C. Gilbert
- Abstract summary: We explore a new method for learning hyperbolic representations by taking a metric-first approach.
Rather than determining the low-dimensional hyperbolic embedding directly, we learn a tree structure on the data.
This tree structure can then be used directly to extract hierarchical information, embedded into a hyperbolic manifold.
- Score: 7.381113319198104
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Given data, finding a faithful low-dimensional hyperbolic embedding of the
data is a key method by which we can extract hierarchical information or learn
representative geometric features of the data. In this paper, we explore a new
method for learning hyperbolic representations by taking a metric-first
approach. Rather than determining the low-dimensional hyperbolic embedding
directly, we learn a tree structure on the data. This tree structure can then
be used directly to extract hierarchical information, embedded into a
hyperbolic manifold using Sarkar's construction \cite{sarkar}, or used as a
tree approximation of the original metric. To this end, we present a novel fast
algorithm \textsc{TreeRep} such that, given a $\delta$-hyperbolic metric (for
any $\delta \geq 0$), the algorithm learns a tree structure that approximates
the original metric. In the case when $\delta = 0$, we show analytically that
\textsc{TreeRep} exactly recovers the original tree structure. We show
empirically that \textsc{TreeRep} is not only many orders of magnitude faster
than previously known algorithms, but also produces metrics with lower average
distortion and higher mean average precision than most previous algorithms for
learning hyperbolic embeddings, extracting hierarchical information, and
approximating metrics via tree metrics.
Related papers
- ReTreever: Tree-based Coarse-to-Fine Representations for Retrieval [64.44265315244579]
We propose a tree-based method for organizing and representing reference documents at various granular levels.
Our method, called ReTreever, jointly learns a routing function per internal node of a binary tree such that query and reference documents are assigned to similar tree branches.
Our evaluations show that ReTreever generally preserves full representation accuracy.
arXiv Detail & Related papers (2025-02-11T21:35:13Z) - Fitting trees to $\ell_1$-hyperbolic distances [4.220336689294244]
Building trees to represent or to fit distances is a critical component of phylogenetic analysis.
We study the tree fitting problem as finding the relation between the hyperbolicity (ultrametricity) vector and the error of tree (ultrametric) embedding.
arXiv Detail & Related papers (2024-09-02T07:38:32Z) - Learning a Decision Tree Algorithm with Transformers [75.96920867382859]
We introduce MetaTree, a transformer-based model trained via meta-learning to directly produce strong decision trees.
We fit both greedy decision trees and globally optimized decision trees on a large number of datasets, and train MetaTree to produce only the trees that achieve strong generalization performance.
arXiv Detail & Related papers (2024-02-06T07:40:53Z) - On Computing Optimal Tree Ensembles [7.424944196676223]
Random forests and, more generally, (decisionnobreakdash-)tree ensembles are widely used methods for classification and regression.
Recent algorithmic advances allow to compute decision trees that are optimal for various measures such as their size or depth.
We provide two novel algorithms and corresponding lower bounds.
arXiv Detail & Related papers (2023-06-07T13:30:43Z) - Hierarchical clustering with dot products recovers hidden tree structure [53.68551192799585]
In this paper we offer a new perspective on the well established agglomerative clustering algorithm, focusing on recovery of hierarchical structure.
We recommend a simple variant of the standard algorithm, in which clusters are merged by maximum average dot product and not, for example, by minimum distance or within-cluster variance.
We demonstrate that the tree output by this algorithm provides a bona fide estimate of generative hierarchical structure in data, under a generic probabilistic graphical model.
arXiv Detail & Related papers (2023-05-24T11:05:12Z) - New Linear-time Algorithm for SubTree Kernel Computation based on
Root-Weighted Tree Automata [0.0]
We propose a new linear time algorithm based on the concept of weighted tree automata for SubTree kernel computation.
Key idea behind the proposed algorithm is to replace DAG reduction and nodes sorting steps.
Our approach has three major advantages: it is output-sensitive, it is free sensitive from the tree types (ordered trees versus unordered trees), and it is well adapted to any incremental tree kernel based learning methods.
arXiv Detail & Related papers (2023-02-02T13:37:48Z) - HyperAid: Denoising in hyperbolic spaces for tree-fitting and
hierarchical clustering [36.738414547278154]
We propose a new approach to treemetric denoising (HyperAid) in hyperbolic spaces.
It transforms original data into data that is more'' tree-like, when evaluated in terms of Gromov's $delta$ hyperbolicity.
We integrate HyperAid with schemes for enforcing nonnegative edge-weights.
arXiv Detail & Related papers (2022-05-19T17:33:16Z) - Incorporating Constituent Syntax for Coreference Resolution [50.71868417008133]
We propose a graph-based method to incorporate constituent syntactic structures.
We also explore to utilise higher-order neighbourhood information to encode rich structures in constituent trees.
Experiments on the English and Chinese portions of OntoNotes 5.0 benchmark show that our proposed model either beats a strong baseline or achieves new state-of-the-art performance.
arXiv Detail & Related papers (2022-02-22T07:40:42Z) - Active-LATHE: An Active Learning Algorithm for Boosting the Error
Exponent for Learning Homogeneous Ising Trees [75.93186954061943]
We design and analyze an algorithm that boosts the error exponent by at least 40% when $rho$ is at least $0.8$.
Our analysis hinges on judiciously exploiting the minute but detectable statistical variation of the samples to allocate more data to parts of the graph.
arXiv Detail & Related papers (2021-10-27T10:45:21Z) - Visualizing hierarchies in scRNA-seq data using a density tree-biased
autoencoder [50.591267188664666]
We propose an approach for identifying a meaningful tree structure from high-dimensional scRNA-seq data.
We then introduce DTAE, a tree-biased autoencoder that emphasizes the tree structure of the data in low dimensional space.
arXiv Detail & Related papers (2021-02-11T08:48:48Z) - Metric Learning for Ordered Labeled Trees with pq-grams [11.284638114256712]
We propose a new metric learning approach for tree-structured data with pq-grams.
The pq-gram distance is a distance for ordered labeled trees, and has much lower computation cost than the tree edit distance.
We empirically show that the proposed approach achieves competitive results with the state-of-the-art edit distance-based methods.
arXiv Detail & Related papers (2020-03-09T08:04:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.