ClaPIM: Scalable Sequence CLAssification using Processing-In-Memory
- URL: http://arxiv.org/abs/2302.08284v2
- Date: Sun, 5 Nov 2023 08:58:47 GMT
- Title: ClaPIM: Scalable Sequence CLAssification using Processing-In-Memory
- Authors: Marcel Khalifa, Barak Hoffer, Orian Leitersdorf, Robert Hanhan, Ben
Perach, Leonid Yavits, and Shahar Kvatinsky
- Abstract summary: ClaPIM is a scalable DNA sequence classification architecture based on the emerging concept of hybrid in-crossbar and near-crossbar memristive processing-in-memory (PIM)
Compared with Kraken2, ClaPIM provides significantly higher classification quality (up to 20x improvement in F1 score) and also demonstrates a 1.8x throughput improvement.
- Score: 1.6124241068249217
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: DNA sequence classification is a fundamental task in computational biology
with vast implications for applications such as disease prevention and drug
design. Therefore, fast high-quality sequence classifiers are significantly
important. This paper introduces ClaPIM, a scalable DNA sequence classification
architecture based on the emerging concept of hybrid in-crossbar and
near-crossbar memristive processing-in-memory (PIM). We enable efficient and
high-quality classification by uniting the filter and search stages within a
single algorithm. Specifically, we propose a custom filtering technique that
drastically narrows the search space and a search approach that facilitates
approximate string matching through a distance function. ClaPIM is the first
PIM architecture for scalable approximate string matching that benefits from
the high density of memristive crossbar arrays and the massive computational
parallelism of PIM. Compared with Kraken2, a state-of-the-art software
classifier, ClaPIM provides significantly higher classification quality (up to
20x improvement in F1 score) and also demonstrates a 1.8x throughput
improvement. Compared with EDAM, a recently-proposed SRAM-based accelerator
that is restricted to small datasets, we observe both a 30.4x improvement in
normalized throughput per area and a 7% increase in classification precision.
Related papers
- Non-uniformity is All You Need: Efficient and Timely Encrypted Traffic Classification With ECHO [3.9154800026646566]
This paper introduces ECHO -- a novel optimization process for ML/DL-based encrypted traffic classification.
ECHO targets both classification time and memory utilization and incorporates two innovative techniques.
arXiv Detail & Related papers (2024-06-03T23:54:48Z) - MOKD: Cross-domain Finetuning for Few-shot Classification via Maximizing Optimized Kernel Dependence [97.93517982908007]
In cross-domain few-shot classification, NCC aims to learn representations to construct a metric space where few-shot classification can be performed.
In this paper, we find that there exist high similarities between NCC-learned representations of two samples from different classes.
We propose a bi-level optimization framework, emphmaximizing optimized kernel dependence (MOKD) to learn a set of class-specific representations that match the cluster structures indicated by labeled data.
arXiv Detail & Related papers (2024-05-29T05:59:52Z) - POCKET: Pruning Random Convolution Kernels for Time Series Classification from a Feature Selection Perspective [8.359327841946852]
A time series classification model, POCKET, is designed to efficiently prune redundant kernels.
POCKET prunes up to 60% of kernels without a significant reduction in accuracy and performs 11$times$ faster than its counterparts.
Experimental results on diverse time series datasets show that POCKET prunes up to 60% of kernels without a significant reduction in accuracy and performs 11$times$ faster than its counterparts.
arXiv Detail & Related papers (2023-09-15T16:03:23Z) - Dynamic Perceiver for Efficient Visual Recognition [87.08210214417309]
We propose Dynamic Perceiver (Dyn-Perceiver) to decouple the feature extraction procedure and the early classification task.
A feature branch serves to extract image features, while a classification branch processes a latent code assigned for classification tasks.
Early exits are placed exclusively within the classification branch, thus eliminating the need for linear separability in low-level features.
arXiv Detail & Related papers (2023-06-20T03:00:22Z) - Quick Adaptive Ternary Segmentation: An Efficient Decoding Procedure For
Hidden Markov Models [70.26374282390401]
Decoding the original signal (i.e., hidden chain) from the noisy observations is one of the main goals in nearly all HMM based data analyses.
We present Quick Adaptive Ternary (QATS), a divide-and-conquer procedure which decodes the hidden sequence in polylogarithmic computational complexity.
arXiv Detail & Related papers (2023-05-29T19:37:48Z) - Resource saving taxonomy classification with k-mer distributions and
machine learning [2.0196229393131726]
We propose to use $k$-mer distributions obtained from DNA as features to classify its taxonomic origin.
We show that our approach improves the classification on the genus level and achieves comparable results for the superkingdom and phylum level.
arXiv Detail & Related papers (2023-03-10T08:01:08Z) - Efficient Approximate Kernel Based Spike Sequence Classification [56.2938724367661]
Machine learning models, such as SVM, require a definition of distance/similarity between pairs of sequences.
Exact methods yield better classification performance, but they pose high computational costs.
We propose a series of ways to improve the performance of the approximate kernel in order to enhance its predictive performance.
arXiv Detail & Related papers (2022-09-11T22:44:19Z) - Decision Making for Hierarchical Multi-label Classification with
Multidimensional Local Precision Rate [4.812468844362369]
We introduce a new statistic called the multidimensional local precision rate (mLPR) for each object in each class.
We show that classification decisions made by simply sorting objects across classes in descending order of their mLPRs can, in theory, ensure the class hierarchy.
In response, we introduce HierRank, a new algorithm that maximizes an empirical version of CATCH using estimated mLPRs while respecting the hierarchy.
arXiv Detail & Related papers (2022-05-16T17:43:35Z) - Pre-training Co-evolutionary Protein Representation via A Pairwise
Masked Language Model [93.9943278892735]
Key problem in protein sequence representation learning is to capture the co-evolutionary information reflected by the inter-residue co-variation in the sequences.
We propose a novel method to capture this information directly by pre-training via a dedicated language model, i.e., Pairwise Masked Language Model (PMLM)
Our result shows that the proposed method can effectively capture the interresidue correlations and improves the performance of contact prediction by up to 9% compared to the baseline.
arXiv Detail & Related papers (2021-10-29T04:01:32Z) - Online Multi-Object Tracking and Segmentation with GMPHD Filter and
Mask-based Affinity Fusion [79.87371506464454]
We propose a fully online multi-object tracking and segmentation (MOTS) method that uses instance segmentation results as an input.
The proposed method is based on the Gaussian mixture probability hypothesis density (GMPHD) filter, a hierarchical data association (HDA), and a mask-based affinity fusion (MAF) model.
In the experiments on the two popular MOTS datasets, the key modules show some improvements.
arXiv Detail & Related papers (2020-08-31T21:06:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.