A Comparison of Polynomial-Based Tree Clustering Methods
- URL: http://arxiv.org/abs/2601.14285v1
- Date: Tue, 13 Jan 2026 19:53:24 GMT
- Title: A Comparison of Polynomial-Based Tree Clustering Methods
- Authors: Pengyu Liu, Mariel Vázquez, Nataša Jonoska,
- Abstract summary: Tree structures appear in many fields of the life sciences, including phylogenetics, developmental biology and nucleic acid structures.<n>Recent developments in sequencing technology have yielded numerous biological data that can be represented with tree structures.<n>Tree structures provide a computationally efficient, interpretable and comprehensive way to encode tree structures as distances.
- Score: 0.799840210529735
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Tree structures appear in many fields of the life sciences, including phylogenetics, developmental biology and nucleic acid structures. Trees can be used to represent RNA secondary structures, which directly relate to the function of non-coding RNAs. Recent developments in sequencing technology and artificial intelligence have yielded numerous biological data that can be represented with tree structures. This requires novel methods for tree structure data analytics. Tree polynomials provide a computationally efficient, interpretable and comprehensive way to encode tree structures as matrices, which are compatible with most data analytics tools. Machine learning methods based on the Canberra distance between tree polynomials have been introduced to analyze phylogenies and nucleic acid structures. In this paper, we compare the performance of different distances in tree clustering methods based on a tree distinguishing polynomial. We also implement two basic autoencoder models for clustering trees using the polynomial. We find that the distance based methods with entry-level normalized distances have the highest clustering accuracy among the compared methods.
Related papers
- Entropy-Tree: Tree-Based Decoding with Entropy-Guided Exploration [52.52685988964061]
Entropy-Tree is a tree-based decoding method that exploits entropy as a signal for branching decisions.<n>It unifies efficient structured exploration and reliable uncertainty estimation within a single decoding procedure.
arXiv Detail & Related papers (2026-01-02T07:14:05Z) - Hierarchical Quantized Diffusion Based Tree Generation Method for Hierarchical Representation and Lineage Analysis [49.00783841494125]
HDTree captures tree relationships within a hierarchical latent space using a unified hierarchical codebook and quantized diffusion processes.<n> HDTree's effectiveness is demonstrated through comparisons on both general-purpose and single-cell datasets.<n>These contributions provide a new tool for hierarchical lineage analysis, enabling more accurate and efficient modeling of cellular differentiation paths.
arXiv Detail & Related papers (2025-06-29T15:19:13Z) - PhyloGen: Language Model-Enhanced Phylogenetic Inference via Graph Structure Generation [50.80441546742053]
Phylogenetic trees elucidate evolutionary relationships among species.<n>Traditional Markov Chain Monte Carlo methods face slow convergence and computational burdens.<n>We propose PhyloGen, a novel method leveraging a pre-trained genomic language model.
arXiv Detail & Related papers (2024-12-25T08:33:05Z) - On the Expressive Power of Tree-Structured Probabilistic Circuits [8.160496835449157]
We show that for $n$ variables, there exists a quasi-polynomial upper bound $nO(log n)$ on the size of an equivalent tree computing the same probability distribution.
We also show that given a depth restriction on the tree, there is a super-polynomial separation between tree and DAG-structured PCs.
arXiv Detail & Related papers (2024-10-07T19:51:30Z) - Learning accurate and interpretable tree-based models [27.203303726977616]
We develop approaches to design tree-based learning algorithms given repeated access to data from the same domain.<n>We propose novel parameterized classes of node splitting criteria in top-down algorithms, which interpolate between popularly used entropy and Gini impurity based criteria.<n>We extend our results to tuning popular tree-based ensembles, including random forests and gradient-boosted trees.
arXiv Detail & Related papers (2024-05-24T20:10:10Z) - Learning a Decision Tree Algorithm with Transformers [75.96920867382859]
We introduce MetaTree, a transformer-based model trained via meta-learning to directly produce strong decision trees.
We fit both greedy decision trees and globally optimized decision trees on a large number of datasets, and train MetaTree to produce only the trees that achieve strong generalization performance.
arXiv Detail & Related papers (2024-02-06T07:40:53Z) - Phylo2Vec: a vector representation for binary trees [0.49478969093606673]
We present Phylo2Vec, a parsimonious encoding for phylogenetic trees.<n>It serves as a unified approach for both manipulating and representing phylogenetic trees.<n>As a proof of concept, we use Phylo2Vec for maximum likelihood inference on five real-world datasets.
arXiv Detail & Related papers (2023-04-25T09:54:35Z) - New Linear-time Algorithm for SubTree Kernel Computation based on
Root-Weighted Tree Automata [0.0]
We propose a new linear time algorithm based on the concept of weighted tree automata for SubTree kernel computation.
Key idea behind the proposed algorithm is to replace DAG reduction and nodes sorting steps.
Our approach has three major advantages: it is output-sensitive, it is free sensitive from the tree types (ordered trees versus unordered trees), and it is well adapted to any incremental tree kernel based learning methods.
arXiv Detail & Related papers (2023-02-02T13:37:48Z) - Visualizing hierarchies in scRNA-seq data using a density tree-biased
autoencoder [50.591267188664666]
We propose an approach for identifying a meaningful tree structure from high-dimensional scRNA-seq data.
We then introduce DTAE, a tree-biased autoencoder that emphasizes the tree structure of the data in low dimensional space.
arXiv Detail & Related papers (2021-02-11T08:48:48Z) - MurTree: Optimal Classification Trees via Dynamic Programming and Search [61.817059565926336]
We present a novel algorithm for learning optimal classification trees based on dynamic programming and search.
Our approach uses only a fraction of the time required by the state-of-the-art and can handle datasets with tens of thousands of instances.
arXiv Detail & Related papers (2020-07-24T17:06:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.