Related papers: Adaptive Learning of Rank-One Models for Efficient Pairwise Sequence Alignment

Adaptive Learning of Rank-One Models for Efficient Pairwise Sequence Alignment

URL: http://arxiv.org/abs/2011.04832v2
Date: Sat, 13 Feb 2021 02:01:40 GMT
Title: Adaptive Learning of Rank-One Models for Efficient Pairwise Sequence Alignment
Authors: Govinda M. Kamath and Tavor Z. Baharav and Ilan Shomorony
Abstract summary: We propose a new approach to pairwise alignment estimation based on two key new ingredients. The first ingredient is to cast the problem of pairwise alignment estimation under a general framework of rank-one crowdsourcing models. The second ingredient is to utilise a multi-armed bandit algorithm to adaptively refine this spectral estimator only for read pairs that are likely to have large alignments.
Score: 20.243497164051824
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Pairwise alignment of DNA sequencing data is a ubiquitous task in bioinformatics and typically represents a heavy computational burden. State-of-the-art approaches to speed up this task use hashing to identify short segments (k-mers) that are shared by pairs of reads, which can then be used to estimate alignment scores. However, when the number of reads is large, accurately estimating alignment scores for all pairs is still very costly. Moreover, in practice, one is only interested in identifying pairs of reads with large alignment scores. In this work, we propose a new approach to pairwise alignment estimation based on two key new ingredients. The first ingredient is to cast the problem of pairwise alignment estimation under a general framework of rank-one crowdsourcing models, where the workers' responses correspond to k-mer hash collisions. These models can be accurately solved via a spectral decomposition of the response matrix. The second ingredient is to utilise a multi-armed bandit algorithm to adaptively refine this spectral estimator only for read pairs that are likely to have large alignments. The resulting algorithm iteratively performs a spectral decomposition of the response matrix for adaptively chosen subsets of the read pairs.

Related papers

Randomized Pairwise Learning with Adaptive Sampling: A PAC-Bayes Analysis [32.8453673919231]
We study optimization with data-adaptive sampling schemes to train pairwise learning models. A notable difference of pairwise learning from pointwise learning is the statistical ascent among input pairs.
arXiv Detail & Related papers (2025-04-03T18:24:01Z)
Semiparametric conformal prediction [79.6147286161434]
We construct a conformal prediction set accounting for the joint correlation structure of the vector-valued non-conformity scores. We flexibly estimate the joint cumulative distribution function (CDF) of the scores. Our method yields desired coverage and competitive efficiency on a range of real-world regression problems.
arXiv Detail & Related papers (2024-11-04T14:29:02Z)
Optimizing Contextual Speech Recognition Using Vector Quantization for Efficient Retrieval [18.333752341467083]
The biasing mechanism is typically based on a cross-attention module between the audio and a catalogue of biasing entries. This work proposes an approximation to cross-attention scoring based on vector quantization. We show that retrieval based shortlisting allows the system to efficiently leverage biasing catalogues of several thousands of entries.
arXiv Detail & Related papers (2024-11-01T15:28:03Z)
Semisupervised score based matching algorithm to evaluate the effect of public health interventions [3.221788913179251]
In one-to-one matching algorithms, a large number of "pairs" to be matched could mean both the information from a large sample and a large number of tasks. We propose a novel one-to-one matching algorithm based on a quadratic score function $S_beta(x_i,x_j)= betaT (x_i-x_j)(x_i-x_j)T beta$.
arXiv Detail & Related papers (2024-03-19T02:24:16Z)
Efficient distributed representations with linear-time attention scores normalization [3.8673630752805437]
We propose a linear-time approximation of the attention score normalization constants for embedding vectors with bounded norms. The accuracy of our estimation formula surpasses competing kernel methods by even orders of magnitude. The proposed algorithm is highly interpretable and easily adapted to an arbitrary embedding problem.
arXiv Detail & Related papers (2023-03-30T15:48:26Z)
Optimal Clustering with Bandit Feedback [57.672609011609886]
This paper considers the problem of online clustering with bandit feedback. It includes a novel stopping rule for sequential testing that circumvents the need to solve any NP-hard weighted clustering problem as its subroutines. We show through extensive simulations on synthetic and real-world datasets that BOC's performance matches the lower boundally, and significantly outperforms a non-adaptive baseline algorithm.
arXiv Detail & Related papers (2022-02-09T06:05:05Z)
Deep Probabilistic Graph Matching [72.6690550634166]
We propose a deep learning-based graph matching framework that works for the original QAP without compromising on the matching constraints. The proposed method is evaluated on three popularly tested benchmarks (Pascal VOC, Willow Object and SPair-71k) and it outperforms all previous state-of-the-arts on all benchmarks.
arXiv Detail & Related papers (2022-01-05T13:37:27Z)
Regularization for Shuffled Data Problems via Exponential Family Priors on the Permutation Group [8.40077201352607]
"Shuffled data" is data in which the correct pairing of (X, Y)-pairs is represented via an unknown index permutation. We propose a flexible exponential family prior on the permutation group for this purpose. Inference is based on the EM algorithm in which the intractable E-step is approximated by the Fisher-Yates algorithm.
arXiv Detail & Related papers (2021-11-02T17:43:28Z)
Adaptive Sampling for Heterogeneous Rank Aggregation from Noisy Pairwise Comparisons [85.5955376526419]
In rank aggregation problems, users exhibit various accuracy levels when comparing pairs of items. We propose an elimination-based active sampling strategy, which estimates the ranking of items via noisy pairwise comparisons. We prove that our algorithm can return the true ranking of items with high probability.
arXiv Detail & Related papers (2021-10-08T13:51:55Z)
Estimating leverage scores via rank revealing methods and randomization [50.591267188664666]
We study algorithms for estimating the statistical leverage scores of rectangular dense or sparse matrices of arbitrary rank. Our approach is based on combining rank revealing methods with compositions of dense and sparse randomized dimensionality reduction transforms.
arXiv Detail & Related papers (2021-05-23T19:21:55Z)
Learning to Accelerate Heuristic Searching for Large-Scale Maximum Weighted b-Matching Problems in Online Advertising [51.97494906131859]
Bipartite b-matching is fundamental in algorithm design, and has been widely applied into economic markets, labor markets, etc. Existing exact and approximate algorithms usually fail in such settings due to either requiring intolerable running time or too much computation resource. We propose textttNeuSearcher which leverages the knowledge learned from previously instances to solve new problem instances.
arXiv Detail & Related papers (2020-05-09T02:48:23Z)
Optimal Clustering from Noisy Binary Feedback [75.17453757892152]
We study the problem of clustering a set of items from binary user feedback. We devise an algorithm with a minimal cluster recovery error rate. For adaptive selection, we develop an algorithm inspired by the derivation of the information-theoretical error lower bounds.
arXiv Detail & Related papers (2019-10-14T09:18:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.