Unsupervised Learning of Phylogenetic Trees via Split-Weight Embedding
- URL: http://arxiv.org/abs/2312.16074v2
- Date: Fri, 3 May 2024 14:39:30 GMT
- Title: Unsupervised Learning of Phylogenetic Trees via Split-Weight Embedding
- Authors: Yibo Kong, George P. Tiley, Claudia Solis-Lemus,
- Abstract summary: We show that our split-weight embedded clustering is able to recover meaningful evolutionary relationships in simulated and real (Adansonia baobabs) data.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Unsupervised learning has become a staple in classical machine learning, successfully identifying clustering patterns in data across a broad range of domain applications. Surprisingly, despite its accuracy and elegant simplicity, unsupervised learning has not been sufficiently exploited in the realm of phylogenetic tree inference. The main reason for the delay in adoption of unsupervised learning in phylogenetics is the lack of a meaningful, yet simple, way of embedding phylogenetic trees into a vector space. Here, we propose the simple yet powerful split-weight embedding which allows us to fit standard clustering algorithms to the space of phylogenetic trees. We show that our split-weight embedded clustering is able to recover meaningful evolutionary relationships in simulated and real (Adansonia baobabs) data.
Related papers
- PhyloGFN: Phylogenetic inference with generative flow networks [57.104166650526416]
We introduce the framework of generative flow networks (GFlowNets) to tackle two core problems in phylogenetics: parsimony-based and phylogenetic inference.
Because GFlowNets are well-suited for sampling complex structures, they are a natural choice for exploring and sampling from the multimodal posterior distribution over tree topologies.
We demonstrate that our amortized posterior sampler, PhyloGFN, produces diverse and high-quality evolutionary hypotheses on real benchmark datasets.
arXiv Detail & Related papers (2023-10-12T23:46:08Z) - URLOST: Unsupervised Representation Learning without Stationarity or
Topology [26.17135629579595]
We introduce a novel framework that learns from high-dimensional data lacking stationarity and topology.
Our model combines a learnable self-organizing layer, density adjusted spectral clustering, and masked autoencoders.
We evaluate its effectiveness on simulated biological vision data, neural recordings from the primary visual cortex, and gene expression datasets.
arXiv Detail & Related papers (2023-10-06T18:00:02Z) - Leaping through tree space: continuous phylogenetic inference for rooted
and unrooted trees [0.49478969093606673]
We perform both tree exploration and inference in a continuous space where the computation of gradients is possible.
This continuous relaxation allows for major leaps across tree space in both rooted and unrooted trees, and is less susceptible to convergence to local minima.
Our approach outperforms the current best methods for inference on unrooted trees and, in simulation, accurately infers the tree and root in ultrametric cases.
arXiv Detail & Related papers (2023-06-09T08:13:06Z) - Constructing Phylogenetic Networks via Cherry Picking and Machine
Learning [0.1045050906735615]
Existing methods are computationally expensive and can either handle only small numbers of phylogenetic trees or are limited to severely restricted classes of networks.
We apply the recently-introduced theoretical framework of cherry picking to design a class of efficients that are guaranteed to produce a network containing each of the input trees.
We also propose simple and fast randomiseds that prove to be very effective when run multiple times.
arXiv Detail & Related papers (2023-03-31T15:04:42Z) - Do We Really Need a Learnable Classifier at the End of Deep Neural
Network? [118.18554882199676]
We study the potential of learning a neural network for classification with the classifier randomly as an ETF and fixed during training.
Our experimental results show that our method is able to achieve similar performances on image classification for balanced datasets.
arXiv Detail & Related papers (2022-03-17T04:34:28Z) - Neural Architecture Search of Deep Priors: Towards Continual Learning
without Catastrophic Interference [2.922007656878633]
We show that it is possible to find random weight architectures, a deep prior, that enables a linear classification to perform on par with fully trained deep counterparts.
In an extension to continual learning, we investigate the possibility of catastrophic interference free incremental learning.
arXiv Detail & Related papers (2021-04-14T11:25:30Z) - Intersection Regularization for Extracting Semantic Attributes [72.53481390411173]
We consider the problem of supervised classification, such that the features that the network extracts match an unseen set of semantic attributes.
For example, when learning to classify images of birds into species, we would like to observe the emergence of features that zoologists use to classify birds.
We propose training a neural network with discrete top-level activations, which is followed by a multi-layered perceptron (MLP) and a parallel decision tree.
arXiv Detail & Related papers (2021-03-22T14:32:44Z) - Theoretical Analysis of Self-Training with Deep Networks on Unlabeled
Data [48.4779912667317]
Self-training algorithms have been very successful for learning with unlabeled data using neural networks.
This work provides a unified theoretical analysis of self-training with deep networks for semi-supervised learning, unsupervised domain adaptation, and unsupervised learning.
arXiv Detail & Related papers (2020-10-07T19:43:55Z) - Forest R-CNN: Large-Vocabulary Long-Tailed Object Detection and Instance
Segmentation [75.93960390191262]
We exploit prior knowledge of the relations among object categories to cluster fine-grained classes into coarser parent classes.
We propose a simple yet effective resampling method, NMS Resampling, to re-balance the data distribution.
Our method, termed as Forest R-CNN, can serve as a plug-and-play module being applied to most object recognition models.
arXiv Detail & Related papers (2020-08-13T03:52:37Z) - A Trainable Optimal Transport Embedding for Feature Aggregation and its
Relationship to Attention [96.77554122595578]
We introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference.
Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.
arXiv Detail & Related papers (2020-06-22T08:35:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.