Combinatorial and computational investigations of Neighbor-Joining bias
- URL: http://arxiv.org/abs/2007.09345v3
- Date: Wed, 16 Sep 2020 20:54:55 GMT
- Title: Combinatorial and computational investigations of Neighbor-Joining bias
- Authors: Ruth Davidson and Abraham Martin del Campo
- Abstract summary: The Neighbor-Joining algorithm computes a tree metric from a dissimilarity map arising from biological data.
A full description of these regions has not been found yet.
Different sequences of Neighbor-Joining agglomeration events can produce the same tree, therefore associating multiple geometric regions to the same output.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Neighbor-Joining algorithm is a popular distance-based phylogenetic
method that computes a tree metric from a dissimilarity map arising from
biological data. Realizing dissimilarity maps as points in Euclidean space, the
algorithm partitions the input space into polyhedral regions indexed by the
combinatorial type of the trees returned. A full combinatorial description of
these regions has not been found yet; different sequences of Neighbor-Joining
agglomeration events can produce the same combinatorial tree, therefore
associating multiple geometric regions to the same algorithmic output. We
resolve this confusion by defining agglomeration orders on trees, leading to a
bijection between distinct regions of the output space and weighted Motzkin
paths. As a result, we give a formula for the number of polyhedral regions
depending only on the number of taxa. We conclude with a computational
comparison between these polyhedral regions, to unveil biases introduced in any
implementation of the algorithm.
Related papers
- LiteSearch: Efficacious Tree Search for LLM [70.29796112457662]
This study introduces a novel guided tree search algorithm with dynamic node selection and node-level exploration budget.
Experiments conducted on the GSM8K and TabMWP datasets demonstrate that our approach enjoys significantly lower computational costs compared to baseline methods.
arXiv Detail & Related papers (2024-06-29T05:14:04Z) - Boundary Detection Algorithm Inspired by Locally Linear Embedding [8.259071011958254]
We propose a method for detecting boundary points inspired by the widely used locally linear embedding algorithm.
We implement this method using two nearest neighborhood search schemes: the $epsilon$-radius ball scheme and the $K$-nearest neighbor scheme.
arXiv Detail & Related papers (2024-06-26T16:05:57Z) - Hierarchical Clustering via Local Search [0.0]
We introduce a local search algorithm for hierarchical clustering.
We show that any locally optimal tree guarantees a revenue of at least $fracn-23sum_i jw(i,j)$ where is $n$ the number of objects and $w: [n] times [n] mathbbR+$ is the associated similarity function.
arXiv Detail & Related papers (2024-05-24T23:46:24Z) - Optimal estimation of Gaussian (poly)trees [25.02920605955238]
We consider both problems of distribution learning (i.e. in KL distance) and structure learning (i.e. exact recovery)
The first approach is based on the Chow-Liu algorithm, and learns an optimal tree-structured distribution efficiently.
The second approach is a modification of the PC algorithm for polytrees that uses partial correlation as a conditional independence tester for constraint-based structure learning.
arXiv Detail & Related papers (2024-02-09T12:58:36Z) - An Optimal Algorithm for the Real-Valued Combinatorial Pure Exploration
of Multi-Armed Bandit [65.268245109828]
We study the real-valued pure exploration problem in the multi-armed bandit (R-CPE-MAB)
Existing methods in the R-CPE-MAB can be seen as a special case of the so-called transductive linear bandits.
We propose an algorithm named the gap-based exploration (CombGapE) algorithm, whose sample complexity matches the lower bound.
arXiv Detail & Related papers (2023-06-15T15:37:31Z) - Linearized Wasserstein dimensionality reduction with approximation
guarantees [65.16758672591365]
LOT Wassmap is a computationally feasible algorithm to uncover low-dimensional structures in the Wasserstein space.
We show that LOT Wassmap attains correct embeddings and that the quality improves with increased sample size.
We also show how LOT Wassmap significantly reduces the computational cost when compared to algorithms that depend on pairwise distance computations.
arXiv Detail & Related papers (2023-02-14T22:12:16Z) - Algorithmic Determination of the Combinatorial Structure of the Linear
Regions of ReLU Neural Networks [0.0]
We determine the regions and facets of all dimensions of the canonical polyhedral complex.
We present an algorithm which calculates this full canonical structure.
The resulting algorithm is numerically stable, time in the number of intermediate neurons, and obtains accurate information across all dimensions.
arXiv Detail & Related papers (2022-07-15T18:36:12Z) - Exact and Approximate Hierarchical Clustering Using A* [51.187990314731344]
We introduce a new approach based on A* search for clustering.
We overcome the prohibitively large search space by combining A* with a novel emphtrellis data structure.
We empirically demonstrate that our method achieves substantially higher quality results than baselines for a particle physics use case and other clustering benchmarks.
arXiv Detail & Related papers (2021-04-14T18:15:27Z) - Partition-based formulations for mixed-integer optimization of trained
ReLU neural networks [66.88252321870085]
This paper introduces a class of mixed-integer formulations for trained ReLU neural networks.
At one extreme, one partition per input recovers the convex hull of a node, i.e., the tightest possible formulation for each node.
arXiv Detail & Related papers (2021-02-08T17:27:34Z) - Convex Polytope Trees [57.56078843831244]
convex polytope trees (CPT) are proposed to expand the family of decision trees by an interpretable generalization of their decision boundary.
We develop a greedy method to efficiently construct CPT and scalable end-to-end training algorithms for the tree parameters when the tree structure is given.
arXiv Detail & Related papers (2020-10-21T19:38:57Z) - Data Structures & Algorithms for Exact Inference in Hierarchical
Clustering [41.24805506595378]
We present novel dynamic-programming algorithms for emphexact inference in hierarchical clustering based on a novel trellis data structure.
Our algorithms scale in time and space proportional to the powerset of $N$ elements which is super-exponentially more efficient than explicitly considering each of the (2N-3)!! possible hierarchies.
arXiv Detail & Related papers (2020-02-26T17:43:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.