Related papers: Combinatorial and computational investigations of Neighbor-Joining bias

Combinatorial and computational investigations of Neighbor-Joining bias

URL: http://arxiv.org/abs/2007.09345v3
Date: Wed, 16 Sep 2020 20:54:55 GMT
Title: Combinatorial and computational investigations of Neighbor-Joining bias
Authors: Ruth Davidson and Abraham Martin del Campo
Abstract summary: The Neighbor-Joining algorithm computes a tree metric from a dissimilarity map arising from biological data. A full description of these regions has not been found yet. Different sequences of Neighbor-Joining agglomeration events can produce the same tree, therefore associating multiple geometric regions to the same output.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The Neighbor-Joining algorithm is a popular distance-based phylogenetic method that computes a tree metric from a dissimilarity map arising from biological data. Realizing dissimilarity maps as points in Euclidean space, the algorithm partitions the input space into polyhedral regions indexed by the combinatorial type of the trees returned. A full combinatorial description of these regions has not been found yet; different sequences of Neighbor-Joining agglomeration events can produce the same combinatorial tree, therefore associating multiple geometric regions to the same algorithmic output. We resolve this confusion by defining agglomeration orders on trees, leading to a bijection between distinct regions of the output space and weighted Motzkin paths. As a result, we give a formula for the number of polyhedral regions depending only on the number of taxa. We conclude with a computational comparison between these polyhedral regions, to unveil biases introduced in any implementation of the algorithm.

Related papers

Infinity Search: Approximate Vector Search with Projections on q-Metric Spaces [94.12116458306916]
We show that in $q$-metric spaces, metric trees can leverage a stronger version of the triangle inequality to reduce comparisons for exact search.<n>We propose a novel projection method that embeds datasets with arbitrary dissimilarity measures into $q$-metric spaces.
arXiv Detail & Related papers (2025-06-06T22:09:44Z)
The Computational Complexity of Counting Linear Regions in ReLU Neural Networks [6.363158395541767]
There exist many different definitions of what a linear region actually is.<n>We analyze the computational complexity of counting the number of such regions for the various definitions.<n>On the algorithmic side, we demonstrate that counting linear regions can at least be achieved in space for some common definitions.
arXiv Detail & Related papers (2025-05-22T14:25:12Z)
LiteSearch: Efficacious Tree Search for LLM [70.29796112457662]
This study introduces a novel guided tree search algorithm with dynamic node selection and node-level exploration budget. Experiments conducted on the GSM8K and TabMWP datasets demonstrate that our approach enjoys significantly lower computational costs compared to baseline methods.
arXiv Detail & Related papers (2024-06-29T05:14:04Z)
Boundary Detection Algorithm Inspired by Locally Linear Embedding [8.259071011958254]
We propose a method for detecting boundary points inspired by the widely used locally linear embedding algorithm. We implement this method using two nearest neighborhood search schemes: the $epsilon$-radius ball scheme and the $K$-nearest neighbor scheme.
arXiv Detail & Related papers (2024-06-26T16:05:57Z)
Hierarchical Clustering via Local Search [0.0]
We introduce a local search algorithm for hierarchical clustering. We show that any locally optimal tree guarantees a revenue of at least $fracn-23sum_i jw(i,j)$ where is $n$ the number of objects and $w: [n] times [n] mathbbR+$ is the associated similarity function.
arXiv Detail & Related papers (2024-05-24T23:46:24Z)
Optimal estimation of Gaussian (poly)trees [25.02920605955238]
We consider both problems of distribution learning (i.e. in KL distance) and structure learning (i.e. exact recovery) The first approach is based on the Chow-Liu algorithm, and learns an optimal tree-structured distribution efficiently. The second approach is a modification of the PC algorithm for polytrees that uses partial correlation as a conditional independence tester for constraint-based structure learning.
arXiv Detail & Related papers (2024-02-09T12:58:36Z)
An Optimal Algorithm for the Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit [65.268245109828]
We study the real-valued pure exploration problem in the multi-armed bandit (R-CPE-MAB) Existing methods in the R-CPE-MAB can be seen as a special case of the so-called transductive linear bandits. We propose an algorithm named the gap-based exploration (CombGapE) algorithm, whose sample complexity matches the lower bound.
arXiv Detail & Related papers (2023-06-15T15:37:31Z)
Linearized Wasserstein dimensionality reduction with approximation guarantees [65.16758672591365]
LOT Wassmap is a computationally feasible algorithm to uncover low-dimensional structures in the Wasserstein space. We show that LOT Wassmap attains correct embeddings and that the quality improves with increased sample size. We also show how LOT Wassmap significantly reduces the computational cost when compared to algorithms that depend on pairwise distance computations.
arXiv Detail & Related papers (2023-02-14T22:12:16Z)
Algorithmic Determination of the Combinatorial Structure of the Linear Regions of ReLU Neural Networks [0.0]
We determine the regions and facets of all dimensions of the canonical polyhedral complex. We present an algorithm which calculates this full canonical structure. The resulting algorithm is numerically stable, time in the number of intermediate neurons, and obtains accurate information across all dimensions.
arXiv Detail & Related papers (2022-07-15T18:36:12Z)
Exact and Approximate Hierarchical Clustering Using A* [51.187990314731344]
We introduce a new approach based on A* search for clustering. We overcome the prohibitively large search space by combining A* with a novel emphtrellis data structure. We empirically demonstrate that our method achieves substantially higher quality results than baselines for a particle physics use case and other clustering benchmarks.
arXiv Detail & Related papers (2021-04-14T18:15:27Z)
Partition-based formulations for mixed-integer optimization of trained ReLU neural networks [66.88252321870085]
This paper introduces a class of mixed-integer formulations for trained ReLU neural networks. At one extreme, one partition per input recovers the convex hull of a node, i.e., the tightest possible formulation for each node.
arXiv Detail & Related papers (2021-02-08T17:27:34Z)
Convex Polytope Trees [57.56078843831244]
convex polytope trees (CPT) are proposed to expand the family of decision trees by an interpretable generalization of their decision boundary. We develop a greedy method to efficiently construct CPT and scalable end-to-end training algorithms for the tree parameters when the tree structure is given.
arXiv Detail & Related papers (2020-10-21T19:38:57Z)
Data Structures & Algorithms for Exact Inference in Hierarchical Clustering [41.24805506595378]
We present novel dynamic-programming algorithms for emphexact inference in hierarchical clustering based on a novel trellis data structure. Our algorithms scale in time and space proportional to the powerset of $N$ elements which is super-exponentially more efficient than explicitly considering each of the (2N-3)!! possible hierarchies.
arXiv Detail & Related papers (2020-02-26T17:43:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.