Efficient Approximate Kernel Based Spike Sequence Classification
- URL: http://arxiv.org/abs/2209.04952v1
- Date: Sun, 11 Sep 2022 22:44:19 GMT
- Title: Efficient Approximate Kernel Based Spike Sequence Classification
- Authors: Sarwan Ali, Bikram Sahoo, Muhammad Asad Khan, Alexander Zelikovsky,
Imdad Ullah Khan, Murray Patterson
- Abstract summary: Machine learning models, such as SVM, require a definition of distance/similarity between pairs of sequences.
Exact methods yield better classification performance, but they pose high computational costs.
We propose a series of ways to improve the performance of the approximate kernel in order to enhance its predictive performance.
- Score: 56.2938724367661
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine learning (ML) models, such as SVM, for tasks like classification and
clustering of sequences, require a definition of distance/similarity between
pairs of sequences. Several methods have been proposed to compute the
similarity between sequences, such as the exact approach that counts the number
of matches between $k$-mers (sub-sequences of length $k$) and an approximate
approach that estimates pairwise similarity scores. Although exact methods
yield better classification performance, they pose high computational costs,
limiting their applicability to a small number of sequences. The approximate
algorithms are proven to be more scalable and perform comparably to (sometimes
better than) the exact methods -- they are designed in a "general" way to deal
with different types of sequences (e.g., music, protein, etc.). Although
general applicability is a desired property of an algorithm, it is not the case
in all scenarios. For example, in the current COVID-19 (coronavirus) pandemic,
there is a need for an approach that can deal specifically with the
coronavirus. To this end, we propose a series of ways to improve the
performance of the approximate kernel (using minimizers and information gain)
in order to enhance its predictive performance pm coronavirus sequences. More
specifically, we improve the quality of the approximate kernel using domain
knowledge (computed using information gain) and efficient preprocessing (using
minimizers computation) to classify coronavirus spike protein sequences
corresponding to different variants (e.g., Alpha, Beta, Gamma). We report
results using different classification and clustering algorithms and evaluate
their performance using multiple evaluation metrics. Using two datasets, we
show that our proposed method helps improve the kernel's performance compared
to the baseline and state-of-the-art approaches in the healthcare domain.
Related papers
- MOKD: Cross-domain Finetuning for Few-shot Classification via Maximizing Optimized Kernel Dependence [97.93517982908007]
In cross-domain few-shot classification, NCC aims to learn representations to construct a metric space where few-shot classification can be performed.
In this paper, we find that there exist high similarities between NCC-learned representations of two samples from different classes.
We propose a bi-level optimization framework, emphmaximizing optimized kernel dependence (MOKD) to learn a set of class-specific representations that match the cluster structures indicated by labeled data.
arXiv Detail & Related papers (2024-05-29T05:59:52Z) - An Efficient Algorithm for Clustered Multi-Task Compressive Sensing [60.70532293880842]
Clustered multi-task compressive sensing is a hierarchical model that solves multiple compressive sensing tasks.
The existing inference algorithm for this model is computationally expensive and does not scale well in high dimensions.
We propose a new algorithm that substantially accelerates model inference by avoiding the need to explicitly compute these covariance matrices.
arXiv Detail & Related papers (2023-09-30T15:57:14Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - BioSequence2Vec: Efficient Embedding Generation For Biological Sequences [1.0896567381206714]
We propose a general-purpose representation learning approach that embodies kernel methods' qualities while avoiding computation, memory, and generalizability challenges.
Our proposed fast and alignment-free embedding method can be used as input to any distance.
We perform a variety of real-world classification tasks, such as SARS-CoV-2 lineage and gene family classification, outperforming several state-of-the-art embedding and kernel methods in predictive performance.
arXiv Detail & Related papers (2023-04-01T10:58:21Z) - ClaPIM: Scalable Sequence CLAssification using Processing-In-Memory [1.6124241068249217]
ClaPIM is a scalable DNA sequence classification architecture based on the emerging concept of hybrid in-crossbar and near-crossbar memristive processing-in-memory (PIM)
Compared with Kraken2, ClaPIM provides significantly higher classification quality (up to 20x improvement in F1 score) and also demonstrates a 1.8x throughput improvement.
arXiv Detail & Related papers (2023-02-16T13:30:36Z) - Evaluating COVID-19 Sequence Data Using Nearest-Neighbors Based Network
Model [0.0]
SARS-CoV-2 coronavirus is the cause of the COVID-19 disease in humans.
It can adapt to different hosts and evolve into different lineages.
It is well-known that the major SARS-CoV-2 lineages are characterized by mutations that happen predominantly in the spike protein.
arXiv Detail & Related papers (2022-11-19T00:34:02Z) - Ensemble Learning based on Classifier Prediction Confidence and
Comprehensive Learning Particle Swarm Optimisation for polyp localisation [6.212408891922064]
Colorectal cancer (CRC) is the first cause of death in many countries.
In this paper, we introduce an ensemble of medical polyp segmentation algorithms.
arXiv Detail & Related papers (2021-04-10T18:34:42Z) - Differentially Private Clustering: Tight Approximation Ratios [57.89473217052714]
We give efficient differentially private algorithms for basic clustering problems.
Our results imply an improved algorithm for the Sample and Aggregate privacy framework.
One of the tools used in our 1-Cluster algorithm can be employed to get a faster quantum algorithm for ClosestPair in a moderate number of dimensions.
arXiv Detail & Related papers (2020-08-18T16:22:06Z) - An Efficient Smoothing Proximal Gradient Algorithm for Convex Clustering [2.5182813818441945]
Recently introduced convex clustering approach formulates clustering as a convex optimization problem.
State-of-the-art convex clustering algorithms require large computation and memory space.
In this paper, we develop a very efficient smoothing gradient algorithm (Sproga) for convex clustering.
arXiv Detail & Related papers (2020-06-22T20:02:59Z) - Ranking a set of objects: a graph based least-square approach [70.7866286425868]
We consider the problem of ranking $N$ objects starting from a set of noisy pairwise comparisons provided by a crowd of equal workers.
We propose a class of non-adaptive ranking algorithms that rely on a least-squares intrinsic optimization criterion for the estimation of qualities.
arXiv Detail & Related papers (2020-02-26T16:19:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.