Hamming Encoder: Mining Discriminative k-mers for Discrete Sequence
Classification
- URL: http://arxiv.org/abs/2310.10321v2
- Date: Fri, 20 Oct 2023 10:30:49 GMT
- Title: Hamming Encoder: Mining Discriminative k-mers for Discrete Sequence
Classification
- Authors: Junjie Dong, Mudi Jiang, Lianyu Hu, Zengyou He
- Abstract summary: Existing pattern-based methods measure the discriminative power of each feature individually during the mining process.
It is difficult to ensure the overall discriminative performance after converting sequences into feature vectors.
We propose Hamming, which utilizes a binarized 1D-convolutional neural network (1DCNN) architecture to mine discriminative k-mer sets.
- Score: 1.6693049653540362
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sequence classification has numerous applications in various fields. Despite
extensive studies in the last decades, many challenges still exist,
particularly in pattern-based methods. Existing pattern-based methods measure
the discriminative power of each feature individually during the mining
process, leading to the result of missing some combinations of features with
discriminative power. Furthermore, it is difficult to ensure the overall
discriminative performance after converting sequences into feature vectors. To
address these challenges, we propose a novel approach called Hamming Encoder,
which utilizes a binarized 1D-convolutional neural network (1DCNN) architecture
to mine discriminative k-mer sets. In particular, we adopt a Hamming
distance-based similarity measure to ensure consistency in the feature mining
and classification procedure. Our method involves training an interpretable CNN
encoder for sequential data and performing a gradient-based search for
discriminative k-mer combinations. Experiments show that the Hamming Encoder
method proposed in this paper outperforms existing state-of-the-art methods in
terms of classification accuracy.
Related papers
- Convolutional autoencoder-based multimodal one-class classification [80.52334952912808]
One-class classification refers to approaches of learning using data from a single class only.
We propose a deep learning one-class classification method suitable for multimodal data.
arXiv Detail & Related papers (2023-09-25T12:31:18Z) - FeCAM: Exploiting the Heterogeneity of Class Distributions in
Exemplar-Free Continual Learning [21.088762527081883]
Exemplar-free class-incremental learning (CIL) poses several challenges since it prohibits the rehearsal of data from previous tasks.
Recent approaches to incrementally learning the classifier by freezing the feature extractor after the first task have gained much attention.
We explore prototypical networks for CIL, which generate new class prototypes using the frozen feature extractor and classify the features based on the Euclidean distance to the prototypes.
arXiv Detail & Related papers (2023-09-25T11:54:33Z) - Sparse-Inductive Generative Adversarial Hashing for Nearest Neighbor
Search [8.020530603813416]
We propose a novel unsupervised hashing method, termed Sparsity-Induced Generative Adversarial Hashing (SiGAH)
SiGAH encodes large-scale high-scale high-dimensional features into binary codes, which solves the two problems through a generative adversarial training framework.
Experimental results on four benchmarks, i.e. Tiny100K, GIST1M, Deep1M, and MNIST, have shown that the proposed SiGAH has superior performance over state-of-the-art approaches.
arXiv Detail & Related papers (2023-06-12T08:07:23Z) - Rethinking Clustering-Based Pseudo-Labeling for Unsupervised
Meta-Learning [146.11600461034746]
Method for unsupervised meta-learning, CACTUs, is a clustering-based approach with pseudo-labeling.
This approach is model-agnostic and can be combined with supervised algorithms to learn from unlabeled data.
We prove that the core reason for this is lack of a clustering-friendly property in the embedding space.
arXiv Detail & Related papers (2022-09-27T19:04:36Z) - Few-Shot Specific Emitter Identification via Deep Metric Ensemble
Learning [26.581059299453663]
We propose a novel FS-SEI for aircraft identification via automatic dependent surveillance-broadcast (ADS-B) signals.
Specifically, the proposed method consists of feature embedding and classification.
Simulation results show that if the number of samples per category is more than 5, the average accuracy of our proposed method is higher than 98%.
arXiv Detail & Related papers (2022-07-14T01:09:22Z) - Learning to Hash Naturally Sorts [84.90210592082829]
We introduce Naturally-Sorted Hashing (NSH) to train a deep hashing model with sorted results end-to-end.
NSH sort the Hamming distances of samples' hash codes and accordingly gather their latent representations for self-supervised training.
We describe a novel Sorted Noise-Contrastive Estimation (SortedNCE) loss that selectively picks positive and negative samples for contrastive learning.
arXiv Detail & Related papers (2022-01-31T16:19:02Z) - Rank-Consistency Deep Hashing for Scalable Multi-Label Image Search [90.30623718137244]
We propose a novel deep hashing method for scalable multi-label image search.
A new rank-consistency objective is applied to align the similarity orders from two spaces.
A powerful loss function is designed to penalize the samples whose semantic similarity and hamming distance are mismatched.
arXiv Detail & Related papers (2021-02-02T13:46:58Z) - High-Dimensional Quadratic Discriminant Analysis under Spiked Covariance
Model [101.74172837046382]
We propose a novel quadratic classification technique, the parameters of which are chosen such that the fisher-discriminant ratio is maximized.
Numerical simulations show that the proposed classifier not only outperforms the classical R-QDA for both synthetic and real data but also requires lower computational complexity.
arXiv Detail & Related papers (2020-06-25T12:00:26Z) - Pairwise Supervised Hashing with Bernoulli Variational Auto-Encoder and
Self-Control Gradient Estimator [62.26981903551382]
Variational auto-encoders (VAEs) with binary latent variables provide state-of-the-art performance in terms of precision for document retrieval.
We propose a pairwise loss function with discrete latent VAE to reward within-class similarity and between-class dissimilarity for supervised hashing.
This new semantic hashing framework achieves superior performance compared to the state-of-the-arts.
arXiv Detail & Related papers (2020-05-21T06:11:33Z) - Deterministic Decoding for Discrete Data in Variational Autoencoders [5.254093731341154]
We study a VAE model with a deterministic decoder (DD-VAE) for sequential data that selects the highest-scoring tokens instead of sampling.
We demonstrate the performance of DD-VAE on multiple datasets, including molecular generation and optimization problems.
arXiv Detail & Related papers (2020-03-04T16:36:52Z) - Boosted Locality Sensitive Hashing: Discriminative Binary Codes for
Source Separation [19.72987718461291]
We propose an adaptive boosting approach to learning locality sensitive hash codes, which represent audio spectra efficiently.
We use the learned hash codes for single-channel speech denoising tasks as an alternative to a complex machine learning model.
arXiv Detail & Related papers (2020-02-14T20:10:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.