Related papers: Hamming Encoder: Mining Discriminative k-mers for Discrete Sequence Classification

Hamming Encoder: Mining Discriminative k-mers for Discrete Sequence Classification

URL: http://arxiv.org/abs/2310.10321v2
Date: Fri, 20 Oct 2023 10:30:49 GMT
Title: Hamming Encoder: Mining Discriminative k-mers for Discrete Sequence Classification
Authors: Junjie Dong, Mudi Jiang, Lianyu Hu, Zengyou He
Abstract summary: Existing pattern-based methods measure the discriminative power of each feature individually during the mining process. It is difficult to ensure the overall discriminative performance after converting sequences into feature vectors. We propose Hamming, which utilizes a binarized 1D-convolutional neural network (1DCNN) architecture to mine discriminative k-mer sets.
Score: 1.6693049653540362
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Sequence classification has numerous applications in various fields. Despite extensive studies in the last decades, many challenges still exist, particularly in pattern-based methods. Existing pattern-based methods measure the discriminative power of each feature individually during the mining process, leading to the result of missing some combinations of features with discriminative power. Furthermore, it is difficult to ensure the overall discriminative performance after converting sequences into feature vectors. To address these challenges, we propose a novel approach called Hamming Encoder, which utilizes a binarized 1D-convolutional neural network (1DCNN) architecture to mine discriminative k-mer sets. In particular, we adopt a Hamming distance-based similarity measure to ensure consistency in the feature mining and classification procedure. Our method involves training an interpretable CNN encoder for sequential data and performing a gradient-based search for discriminative k-mer combinations. Experiments show that the Hamming Encoder method proposed in this paper outperforms existing state-of-the-art methods in terms of classification accuracy.

Related papers

Triplet Loss Based Quantum Encoding for Class Separability [2.7641963278515114]
The encoding circuit is trained using a triplet loss function inspired by classical facial recognition algorithms.<n> Benchmark tests performed on various binary classification tasks on MNIST and MedMNIST datasets demonstrate considerable improvement over amplitude encoding with the same VQC structure.
arXiv Detail & Related papers (2025-09-19T07:28:19Z)
Convolutional autoencoder-based multimodal one-class classification [80.52334952912808]
One-class classification refers to approaches of learning using data from a single class only. We propose a deep learning one-class classification method suitable for multimodal data.
arXiv Detail & Related papers (2023-09-25T12:31:18Z)
FeCAM: Exploiting the Heterogeneity of Class Distributions in Exemplar-Free Continual Learning [21.088762527081883]
Exemplar-free class-incremental learning (CIL) poses several challenges since it prohibits the rehearsal of data from previous tasks. Recent approaches to incrementally learning the classifier by freezing the feature extractor after the first task have gained much attention. We explore prototypical networks for CIL, which generate new class prototypes using the frozen feature extractor and classify the features based on the Euclidean distance to the prototypes.
arXiv Detail & Related papers (2023-09-25T11:54:33Z)
Sparse-Inductive Generative Adversarial Hashing for Nearest Neighbor Search [8.020530603813416]
We propose a novel unsupervised hashing method, termed Sparsity-Induced Generative Adversarial Hashing (SiGAH) SiGAH encodes large-scale high-scale high-dimensional features into binary codes, which solves the two problems through a generative adversarial training framework. Experimental results on four benchmarks, i.e. Tiny100K, GIST1M, Deep1M, and MNIST, have shown that the proposed SiGAH has superior performance over state-of-the-art approaches.
arXiv Detail & Related papers (2023-06-12T08:07:23Z)
Rethinking Clustering-Based Pseudo-Labeling for Unsupervised Meta-Learning [146.11600461034746]
Method for unsupervised meta-learning, CACTUs, is a clustering-based approach with pseudo-labeling. This approach is model-agnostic and can be combined with supervised algorithms to learn from unlabeled data. We prove that the core reason for this is lack of a clustering-friendly property in the embedding space.
arXiv Detail & Related papers (2022-09-27T19:04:36Z)
Few-Shot Specific Emitter Identification via Deep Metric Ensemble Learning [26.581059299453663]
We propose a novel FS-SEI for aircraft identification via automatic dependent surveillance-broadcast (ADS-B) signals. Specifically, the proposed method consists of feature embedding and classification. Simulation results show that if the number of samples per category is more than 5, the average accuracy of our proposed method is higher than 98%.
arXiv Detail & Related papers (2022-07-14T01:09:22Z)
Learning to Hash Naturally Sorts [84.90210592082829]
We introduce Naturally-Sorted Hashing (NSH) to train a deep hashing model with sorted results end-to-end. NSH sort the Hamming distances of samples' hash codes and accordingly gather their latent representations for self-supervised training. We describe a novel Sorted Noise-Contrastive Estimation (SortedNCE) loss that selectively picks positive and negative samples for contrastive learning.
arXiv Detail & Related papers (2022-01-31T16:19:02Z)
A Comparative Evaluation of Quantification Methods [2.802657211770274]
Quantification represents the problem of estimating the distribution of class labels on unseen data. In this work, we compare 24 different methods on overall more than 40 data sets, considering binary as well as multiclass quantification settings. No single algorithm generally outperforms all competitors, but identify a group of methods including the threshold selection-based Median Sweep and TSMax methods. For the multiclass setting, we observe that a different, broad group of algorithms yields good performance, including the HDx method, the Generalized Probabilistic Adjusted Count, the readme method, the energy distance minimization method, the EM
arXiv Detail & Related papers (2021-03-04T18:51:06Z)
Rank-Consistency Deep Hashing for Scalable Multi-Label Image Search [90.30623718137244]
We propose a novel deep hashing method for scalable multi-label image search. A new rank-consistency objective is applied to align the similarity orders from two spaces. A powerful loss function is designed to penalize the samples whose semantic similarity and hamming distance are mismatched.
arXiv Detail & Related papers (2021-02-02T13:46:58Z)
CIMON: Towards High-quality Hash Codes [63.37321228830102]
We propose a new method named textbfComprehensive stextbfImilarity textbfMining and ctextbfOnsistency leartextbfNing (CIMON) First, we use global refinement and similarity statistical distribution to obtain reliable and smooth guidance. Second, both semantic and contrastive consistency learning are introduced to derive both disturb-invariant and discriminative hash codes.
arXiv Detail & Related papers (2020-10-15T14:47:14Z)
High-Dimensional Quadratic Discriminant Analysis under Spiked Covariance Model [101.74172837046382]
We propose a novel quadratic classification technique, the parameters of which are chosen such that the fisher-discriminant ratio is maximized. Numerical simulations show that the proposed classifier not only outperforms the classical R-QDA for both synthetic and real data but also requires lower computational complexity.
arXiv Detail & Related papers (2020-06-25T12:00:26Z)
Pairwise Supervised Hashing with Bernoulli Variational Auto-Encoder and Self-Control Gradient Estimator [62.26981903551382]
Variational auto-encoders (VAEs) with binary latent variables provide state-of-the-art performance in terms of precision for document retrieval. We propose a pairwise loss function with discrete latent VAE to reward within-class similarity and between-class dissimilarity for supervised hashing. This new semantic hashing framework achieves superior performance compared to the state-of-the-arts.
arXiv Detail & Related papers (2020-05-21T06:11:33Z)
Deterministic Decoding for Discrete Data in Variational Autoencoders [5.254093731341154]
We study a VAE model with a deterministic decoder (DD-VAE) for sequential data that selects the highest-scoring tokens instead of sampling. We demonstrate the performance of DD-VAE on multiple datasets, including molecular generation and optimization problems.
arXiv Detail & Related papers (2020-03-04T16:36:52Z)
Boosted Locality Sensitive Hashing: Discriminative Binary Codes for Source Separation [19.72987718461291]
We propose an adaptive boosting approach to learning locality sensitive hash codes, which represent audio spectra efficiently. We use the learned hash codes for single-channel speech denoising tasks as an alternative to a complex machine learning model.
arXiv Detail & Related papers (2020-02-14T20:10:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.