Dynamic Classification: Leveraging Self-Supervised Classification to Enhance Prediction Performance
- URL: http://arxiv.org/abs/2502.18891v2
- Date: Fri, 30 May 2025 07:55:34 GMT
- Title: Dynamic Classification: Leveraging Self-Supervised Classification to Enhance Prediction Performance
- Authors: Ziyuan Zhong, Junyang Zhou,
- Abstract summary: We propose an innovative dynamic classification algorithm aimed at achieving zero missed detections and minimal false positives.<n>The algorithm partitions data in a self-supervised learning-generated way, which allows the model to learn from the training set.<n> Experimental results show that, with minimal data partitioning errors, the algorithm achieves exceptional performance.
- Score: 2.2736104746143355
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this study, we propose an innovative dynamic classification algorithm aimed at achieving zero missed detections and minimal false positives,acritical in safety-critical domains (e.g., medical diagnostics) where undetected cases risk severe outcomes. The algorithm partitions data in a self-supervised learning-generated way, which allows the model to learn from the training set to understand the data distribution and thereby divides training set and test set into N different subareas. The training and test subsets in the same subarea will have nearly the same boundary. For each subarea, there will be the same type of model, such as linear or random forest model, to predict the results of that subareas. In addition, the algorithm uses subareas boundary to refine predictions results and filter out substandard results without requiring additional models. This approach allows each model to operate within a smaller data range and remove the inaccurate prediction results, thereby improving overall accuracy. Experimental results show that, with minimal data partitioning errors, the algorithm achieves exceptional performance with zero missed detections and minimal false positives, outperforming existing ensembles like XGBoost or LGBM model. Even with larger classification errors, it remains comparable to that of state-of-the-art models. Key innovations include self-supervised classification learning, small-range subset predictions, and optimizing the prediction results and eliminate the unqualified ones without the need for additional model support. Although the algorithm still has room for improvement in automatic parameter tuning and efficiency, it demonstrates outstanding performance across multiple datasets. Future work will focus on optimizing the classification components to enhance robustness and adaptability.
Related papers
- Evaluating Ensemble and Deep Learning Models for Static Malware Detection with Dimensionality Reduction Using the EMBER Dataset [0.0]
This study investigates the effectiveness of several machine learning algorithms for static malware detection using the EMBER dataset.<n>We evaluate eight classification models: LightGBM, XGBoost, CatBoost, Random Forest, Extra Trees, HistGradientBoosting, k-Nearest Neighbors (KNN), and TabNet.<n>The models are assessed on accuracy, precision, recall, F1 score, and AUC to examine both predictive performance and robustness.
arXiv Detail & Related papers (2025-07-22T18:45:10Z) - Consistency-based Abductive Reasoning over Perceptual Errors of Multiple Pre-trained Models in Novel Environments [5.5855749614100825]
This paper addresses the hypothesis that leveraging multiple pre-trained models can mitigate this recall reduction.<n>We formulate the challenge of identifying and managing conflicting predictions from various models as a consistency-based abduction problem.<n>Our results validate the use of consistency-based abduction as an effective mechanism to robustly integrate knowledge from multiple imperfect models in challenging, novel scenarios.
arXiv Detail & Related papers (2025-05-25T23:17:47Z) - Boosting of Classification Models with Human-in-the-Loop Computational Visual Knowledge Discovery [2.9465623430708905]
This paper proposes moving boosting methodology from focusing on only misclassified cases to all cases in the class overlap areas.<n>A Divide and Classify process splits cases to simple and complex, classifying these individually through computational analysis and data visualization.<n>After finding pure and overlap class areas simple cases in pure areas are classified, generating interpretable sub-models like decision rules in Propositional and First-order Logics.
arXiv Detail & Related papers (2025-02-10T21:09:19Z) - Supervised Score-Based Modeling by Gradient Boosting [49.556736252628745]
We propose a Supervised Score-based Model (SSM) which can be viewed as a gradient boosting algorithm combining score matching.<n>We provide a theoretical analysis of learning and sampling for SSM to balance inference time and prediction accuracy.<n>Our model outperforms existing models in both accuracy and inference time.
arXiv Detail & Related papers (2024-11-02T07:06:53Z) - Attribute-to-Delete: Machine Unlearning via Datamodel Matching [65.13151619119782]
Machine unlearning -- efficiently removing a small "forget set" training data on a pre-divertrained machine learning model -- has recently attracted interest.
Recent research shows that machine unlearning techniques do not hold up in such a challenging setting.
arXiv Detail & Related papers (2024-10-30T17:20:10Z) - DRoP: Distributionally Robust Data Pruning [11.930434318557156]
We conduct the first systematic study of the impact of data pruning on classification bias of trained models.<n>We propose DRoP, a distributionally robust approach to pruning and empirically demonstrate its performance on standard computer vision benchmarks.
arXiv Detail & Related papers (2024-04-08T14:55:35Z) - Less is More: Fewer Interpretable Region via Submodular Subset Selection [54.07758302264416]
This paper re-models the above image attribution problem as a submodular subset selection problem.
We construct a novel submodular function to discover more accurate small interpretation regions.
For correctly predicted samples, the proposed method improves the Deletion and Insertion scores with an average of 4.9% and 2.5% gain relative to HSIC-Attribution.
arXiv Detail & Related papers (2024-02-14T13:30:02Z) - Towards Better Certified Segmentation via Diffusion Models [62.21617614504225]
segmentation models can be vulnerable to adversarial perturbations, which hinders their use in critical-decision systems like healthcare or autonomous driving.
Recently, randomized smoothing has been proposed to certify segmentation predictions by adding Gaussian noise to the input to obtain theoretical guarantees.
In this paper, we address the problem of certifying segmentation prediction using a combination of randomized smoothing and diffusion models.
arXiv Detail & Related papers (2023-06-16T16:30:39Z) - Improving Adaptive Conformal Prediction Using Self-Supervised Learning [72.2614468437919]
We train an auxiliary model with a self-supervised pretext task on top of an existing predictive model and use the self-supervised error as an additional feature to estimate nonconformity scores.
We empirically demonstrate the benefit of the additional information using both synthetic and real data on the efficiency (width), deficit, and excess of conformal prediction intervals.
arXiv Detail & Related papers (2023-02-23T18:57:14Z) - Towards Diverse Evaluation of Class Incremental Learning: A Representation Learning Perspective [67.45111837188685]
Class incremental learning (CIL) algorithms aim to continually learn new object classes from incrementally arriving data.
We experimentally analyze neural network models trained by CIL algorithms using various evaluation protocols in representation learning.
arXiv Detail & Related papers (2022-06-16T11:44:11Z) - Efficient and Differentiable Conformal Prediction with General Function
Classes [96.74055810115456]
We propose a generalization of conformal prediction to multiple learnable parameters.
We show that it achieves approximate valid population coverage and near-optimal efficiency within class.
Experiments show that our algorithm is able to learn valid prediction sets and improve the efficiency significantly.
arXiv Detail & Related papers (2022-02-22T18:37:23Z) - Diversity Enhanced Active Learning with Strictly Proper Scoring Rules [4.81450893955064]
We study acquisition functions for active learning (AL) for text classification.
We convert the Expected Loss Reduction (ELR) method to estimate the increase in (strictly proper) scores like log probability or negative mean square error.
We show that the use of mean square error and log probability with BEMPS yields robust acquisition functions.
arXiv Detail & Related papers (2021-10-27T05:02:11Z) - X-model: Improving Data Efficiency in Deep Learning with A Minimax Model [78.55482897452417]
We aim at improving data efficiency for both classification and regression setups in deep learning.
To take the power of both worlds, we propose a novel X-model.
X-model plays a minimax game between the feature extractor and task-specific heads.
arXiv Detail & Related papers (2021-10-09T13:56:48Z) - Quantum-Assisted Feature Selection for Vehicle Price Prediction Modeling [0.0]
We study metrics for encoding the search as a binary model, such as Generalized Mean Information Coefficient and Pearson Correlation Coefficient.
We achieve accuracy scores of 0.9 for finding optimal subsets on synthetic data using a new metric that we define.
Our findings show that by leveraging quantum-assisted routines we find solutions that increase the quality of predictive model output.
arXiv Detail & Related papers (2021-04-08T20:48:44Z) - ALEX: Active Learning based Enhancement of a Model's Explainability [34.26945469627691]
An active learning (AL) algorithm seeks to construct an effective classifier with a minimal number of labeled examples in a bootstrapping manner.
In the era of data-driven learning, this is an important research direction to pursue.
This paper describes our work-in-progress towards developing an AL selection function that in addition to model effectiveness also seeks to improve on the interpretability of a model during the bootstrapping steps.
arXiv Detail & Related papers (2020-09-02T07:15:39Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Efficient Ensemble Model Generation for Uncertainty Estimation with
Bayesian Approximation in Segmentation [74.06904875527556]
We propose a generic and efficient segmentation framework to construct ensemble segmentation models.
In the proposed method, ensemble models can be efficiently generated by using the layer selection method.
We also devise a new pixel-wise uncertainty loss, which improves the predictive performance.
arXiv Detail & Related papers (2020-05-21T16:08:38Z) - Dynamic Decision Boundary for One-class Classifiers applied to
non-uniformly Sampled Data [0.9569316316728905]
A typical issue in Pattern Recognition is the non-uniformly sampled data.
In this paper, we propose a one-class classifier based on the minimum spanning tree with a dynamic decision boundary.
arXiv Detail & Related papers (2020-04-05T18:29:36Z) - Discrete-Valued Latent Preference Matrix Estimation with Graph Side
Information [12.836994708337144]
We develop an algorithm that matches the optimal sample complexity.
Our algorithm is robust to model errors and outperforms the existing algorithms in terms of prediction performance.
arXiv Detail & Related papers (2020-03-16T06:29:24Z) - A Non-Intrusive Correction Algorithm for Classification Problems with
Corrupted Data [3.908426668574935]
A novel correction algorithm is proposed for multi-class classification problems with corrupted training data.
The algorithm is non-intrusive, in the sense that it post-processes a trained classification model by adding a correction procedure to the model prediction.
arXiv Detail & Related papers (2020-02-11T20:07:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.