Related papers: Graph-Based Bidirectional Transformer Decision Threshold Adjustment Algorithm for Class-Imbalanced Molecular Data

Graph-Based Bidirectional Transformer Decision Threshold Adjustment Algorithm for Class-Imbalanced Molecular Data

URL: http://arxiv.org/abs/2406.06479v2
Date: Wed, 19 Jun 2024 21:34:07 GMT
Title: Graph-Based Bidirectional Transformer Decision Threshold Adjustment Algorithm for Class-Imbalanced Molecular Data
Authors: Nicole Hayes, Ekaterina Merkurjev, Guo-Wei Wei,
Abstract summary: We propose the BTDT-MBO algorithm, incorporating Merriman-Bence-Osher (MBO) techniques and a bidirectional transformer. We show that the proposed method performs better than competing algorithms even when the class imbalance ratio is very high.
Score: 1.3108652488669732
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Data sets with imbalanced class sizes, often where one class size is much smaller than that of others, occur extremely often in various applications, including those with biological foundations, such as drug discovery and disease diagnosis. Thus, it is extremely important to be able to identify data elements of classes of various sizes, as a failure to detect can result in heavy costs. However, many data classification algorithms do not perform well on imbalanced data sets as they often fail to detect elements belonging to underrepresented classes. In this paper, we propose the BTDT-MBO algorithm, incorporating Merriman-Bence-Osher (MBO) techniques and a bidirectional transformer, as well as distance correlation and decision threshold adjustments, for data classification problems on highly imbalanced molecular data sets, where the sizes of the classes vary greatly. The proposed method not only integrates adjustments in the classification threshold for the MBO algorithm in order to help deal with the class imbalance, but also uses a bidirectional transformer model based on an attention mechanism for self-supervised learning. Additionally, the method implements distance correlation as a weight function for the similarity graph-based framework on which the adjusted MBO algorithm operates. The proposed model is validated using six molecular data sets, and we also provide a thorough comparison to other competing algorithms. The computational experiments show that the proposed method performs better than competing techniques even when the class imbalance ratio is very high.

Related papers

Unbiased Max-Min Embedding Classification for Transductive Few-Shot Learning: Clustering and Classification Are All You Need [83.10178754323955]
Few-shot learning enables models to generalize from only a few labeled examples. We propose the Unbiased Max-Min Embedding Classification (UMMEC) Method, which addresses the key challenges in few-shot learning. Our method significantly improves classification performance with minimal labeled data, advancing the state-of-the-art in annotatedL.
arXiv Detail & Related papers (2025-03-28T07:23:07Z)
A Hybrid Approach for Binary Classification of Imbalanced Data [0.0]
We propose HADR, a hybrid approach with dimension reduction that consists of data block construction, dimentionality reduction, and ensemble learning. We evaluate the performance on eight imbalanced public datasets in terms of recall, G-mean, and AUC.
arXiv Detail & Related papers (2022-07-06T15:18:41Z)
Instance-Dependent Label-Noise Learning with Manifold-Regularized Transition Matrix Estimation [172.81824511381984]
The transition matrix T(x) is unidentifiable under the instance-dependent noise(IDN) We propose assumption on the geometry of T(x) that "the closer two instances are, the more similar their corresponding transition matrices should be" Our method is superior to state-of-the-art approaches for label-noise learning under the challenging IDN.
arXiv Detail & Related papers (2022-06-06T04:12:01Z)
Ensemble Classifier Design Tuned to Dataset Characteristics for Network Intrusion Detection [0.0]
Two new algorithms are proposed to address the class overlap issue in the dataset. The proposed design is evaluated for both binary and multi-category classification.
arXiv Detail & Related papers (2022-05-08T21:06:42Z)
CMW-Net: Learning a Class-Aware Sample Weighting Mapping for Robust Deep Learning [55.733193075728096]
Modern deep neural networks can easily overfit to biased training data containing corrupted labels or class imbalance. Sample re-weighting methods are popularly used to alleviate this data bias issue. We propose a meta-model capable of adaptively learning an explicit weighting scheme directly from data.
arXiv Detail & Related papers (2022-02-11T13:49:51Z)
IB-GAN: A Unified Approach for Multivariate Time Series Classification under Class Imbalance [1.854931308524932]
Non-parametric data augmentation with Generative Adversarial Networks (GANs) offers a promising solution. We propose Imputation Balanced GAN (IB-GAN), a novel method that joins data augmentation and classification in a one-step process via an imputation-balancing approach.
arXiv Detail & Related papers (2021-10-14T15:31:16Z)
Self-Trained One-class Classification for Unsupervised Anomaly Detection [56.35424872736276]
Anomaly detection (AD) has various applications across domains, from manufacturing to healthcare. In this work, we focus on unsupervised AD problems whose entire training data are unlabeled and may contain both normal and anomalous samples. To tackle this problem, we build a robust one-class classification framework via data refinement. We show that our method outperforms state-of-the-art one-class classification method by 6.3 AUC and 12.5 average precision.
arXiv Detail & Related papers (2021-06-11T01:36:08Z)
Hybrid Ensemble optimized algorithm based on Genetic Programming for imbalanced data classification [0.0]
We propose a hybrid ensemble algorithm based on Genetic Programming (GP) for two classes of imbalanced data classification. Experimental results show the performance of the proposed method on the specified data sets in the size of the training set shows 40% and 50% better accuracy than other dimensions of the minority class prediction.
arXiv Detail & Related papers (2021-06-02T14:14:38Z)
Attentional-Biased Stochastic Gradient Descent [74.49926199036481]
We present a provable method (named ABSGD) for addressing the data imbalance or label noise problem in deep learning. Our method is a simple modification to momentum SGD where we assign an individual importance weight to each sample in the mini-batch. ABSGD is flexible enough to combine with other robust losses without any additional cost.
arXiv Detail & Related papers (2020-12-13T03:41:52Z)
Theoretical Insights Into Multiclass Classification: A High-dimensional Asymptotic View [82.80085730891126]
We provide the first modernally precise analysis of linear multiclass classification. Our analysis reveals that the classification accuracy is highly distribution-dependent. The insights gained may pave the way for a precise understanding of other classification algorithms.
arXiv Detail & Related papers (2020-11-16T05:17:29Z)
A Method for Handling Multi-class Imbalanced Data by Geometry based Information Sampling and Class Prioritized Synthetic Data Generation (GICaPS) [15.433936272310952]
This paper looks into the problem of handling imbalanced data in a multi-label classification problem. Two novel methods are proposed that exploit the geometric relationship between the feature vectors. The efficacy of the proposed methods is analyzed by solving a generic multi-class recognition problem.
arXiv Detail & Related papers (2020-10-11T04:04:26Z)
Machine Learning Pipeline for Pulsar Star Dataset [58.720142291102135]
This work brings together some of the most common machine learning (ML) algorithms. The objective is to make a comparison at the level of obtained results from a set of unbalanced data.
arXiv Detail & Related papers (2020-05-03T23:35:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.