Adaptive Learning of Rank-One Models for Efficient Pairwise Sequence
Alignment
- URL: http://arxiv.org/abs/2011.04832v2
- Date: Sat, 13 Feb 2021 02:01:40 GMT
- Title: Adaptive Learning of Rank-One Models for Efficient Pairwise Sequence
Alignment
- Authors: Govinda M. Kamath and Tavor Z. Baharav and Ilan Shomorony
- Abstract summary: We propose a new approach to pairwise alignment estimation based on two key new ingredients.
The first ingredient is to cast the problem of pairwise alignment estimation under a general framework of rank-one crowdsourcing models.
The second ingredient is to utilise a multi-armed bandit algorithm to adaptively refine this spectral estimator only for read pairs that are likely to have large alignments.
- Score: 20.243497164051824
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pairwise alignment of DNA sequencing data is a ubiquitous task in
bioinformatics and typically represents a heavy computational burden.
State-of-the-art approaches to speed up this task use hashing to identify short
segments (k-mers) that are shared by pairs of reads, which can then be used to
estimate alignment scores. However, when the number of reads is large,
accurately estimating alignment scores for all pairs is still very costly.
Moreover, in practice, one is only interested in identifying pairs of reads
with large alignment scores. In this work, we propose a new approach to
pairwise alignment estimation based on two key new ingredients. The first
ingredient is to cast the problem of pairwise alignment estimation under a
general framework of rank-one crowdsourcing models, where the workers'
responses correspond to k-mer hash collisions. These models can be accurately
solved via a spectral decomposition of the response matrix. The second
ingredient is to utilise a multi-armed bandit algorithm to adaptively refine
this spectral estimator only for read pairs that are likely to have large
alignments. The resulting algorithm iteratively performs a spectral
decomposition of the response matrix for adaptively chosen subsets of the read
pairs.
Related papers
- Semisupervised score based matching algorithm to evaluate the effect of public health interventions [3.221788913179251]
In one-to-one matching algorithms, a large number of "pairs" to be matched could mean both the information from a large sample and a large number of tasks.
We propose a novel one-to-one matching algorithm based on a quadratic score function $S_beta(x_i,x_j)= betaT (x_i-x_j)(x_i-x_j)T beta$.
arXiv Detail & Related papers (2024-03-19T02:24:16Z) - Optimal Clustering with Bandit Feedback [57.672609011609886]
This paper considers the problem of online clustering with bandit feedback.
It includes a novel stopping rule for sequential testing that circumvents the need to solve any NP-hard weighted clustering problem as its subroutines.
We show through extensive simulations on synthetic and real-world datasets that BOC's performance matches the lower boundally, and significantly outperforms a non-adaptive baseline algorithm.
arXiv Detail & Related papers (2022-02-09T06:05:05Z) - Deep Probabilistic Graph Matching [72.6690550634166]
We propose a deep learning-based graph matching framework that works for the original QAP without compromising on the matching constraints.
The proposed method is evaluated on three popularly tested benchmarks (Pascal VOC, Willow Object and SPair-71k) and it outperforms all previous state-of-the-arts on all benchmarks.
arXiv Detail & Related papers (2022-01-05T13:37:27Z) - Regularization for Shuffled Data Problems via Exponential Family Priors
on the Permutation Group [8.40077201352607]
"Shuffled data" is data in which the correct pairing of (X, Y)-pairs is represented via an unknown index permutation.
We propose a flexible exponential family prior on the permutation group for this purpose.
Inference is based on the EM algorithm in which the intractable E-step is approximated by the Fisher-Yates algorithm.
arXiv Detail & Related papers (2021-11-02T17:43:28Z) - Adaptive Sampling for Heterogeneous Rank Aggregation from Noisy Pairwise
Comparisons [85.5955376526419]
In rank aggregation problems, users exhibit various accuracy levels when comparing pairs of items.
We propose an elimination-based active sampling strategy, which estimates the ranking of items via noisy pairwise comparisons.
We prove that our algorithm can return the true ranking of items with high probability.
arXiv Detail & Related papers (2021-10-08T13:51:55Z) - Estimating leverage scores via rank revealing methods and randomization [50.591267188664666]
We study algorithms for estimating the statistical leverage scores of rectangular dense or sparse matrices of arbitrary rank.
Our approach is based on combining rank revealing methods with compositions of dense and sparse randomized dimensionality reduction transforms.
arXiv Detail & Related papers (2021-05-23T19:21:55Z) - An Empirical Process Approach to the Union Bound: Practical Algorithms
for Combinatorial and Linear Bandits [34.06611065493047]
This paper proposes near-optimal algorithms for the pure-exploration linear bandit problem in the fixed confidence and fixed budget settings.
We provide an algorithm whose sample complexity scales with the geometry of the instance and avoids an explicit union bound over the number of arms.
We also propose the first algorithm for linear bandits in the the fixed budget setting.
arXiv Detail & Related papers (2020-06-21T00:56:33Z) - Learning to Accelerate Heuristic Searching for Large-Scale Maximum
Weighted b-Matching Problems in Online Advertising [51.97494906131859]
Bipartite b-matching is fundamental in algorithm design, and has been widely applied into economic markets, labor markets, etc.
Existing exact and approximate algorithms usually fail in such settings due to either requiring intolerable running time or too much computation resource.
We propose textttNeuSearcher which leverages the knowledge learned from previously instances to solve new problem instances.
arXiv Detail & Related papers (2020-05-09T02:48:23Z) - Discrete-Valued Latent Preference Matrix Estimation with Graph Side
Information [12.836994708337144]
We develop an algorithm that matches the optimal sample complexity.
Our algorithm is robust to model errors and outperforms the existing algorithms in terms of prediction performance.
arXiv Detail & Related papers (2020-03-16T06:29:24Z) - Optimal Clustering from Noisy Binary Feedback [75.17453757892152]
We study the problem of clustering a set of items from binary user feedback.
We devise an algorithm with a minimal cluster recovery error rate.
For adaptive selection, we develop an algorithm inspired by the derivation of the information-theoretical error lower bounds.
arXiv Detail & Related papers (2019-10-14T09:18:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.