A Non-Intrusive Correction Algorithm for Classification Problems with
Corrupted Data
- URL: http://arxiv.org/abs/2002.04658v1
- Date: Tue, 11 Feb 2020 20:07:05 GMT
- Title: A Non-Intrusive Correction Algorithm for Classification Problems with
Corrupted Data
- Authors: Jun Hou, Tong Qin, Kailiang Wu, Dongbin Xiu
- Abstract summary: A novel correction algorithm is proposed for multi-class classification problems with corrupted training data.
The algorithm is non-intrusive, in the sense that it post-processes a trained classification model by adding a correction procedure to the model prediction.
- Score: 3.908426668574935
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A novel correction algorithm is proposed for multi-class classification
problems with corrupted training data. The algorithm is non-intrusive, in the
sense that it post-processes a trained classification model by adding a
correction procedure to the model prediction. The correction procedure can be
coupled with any approximators, such as logistic regression, neural networks of
various architectures, etc. When training dataset is sufficiently large, we
prove that the corrected models deliver correct classification results as if
there is no corruption in the training data. For datasets of finite size, the
corrected models produce significantly better recovery results, compared to the
models without the correction algorithm. All of the theoretical findings in the
paper are verified by our numerical examples.
Related papers
- Dynamic Classification: Leveraging Self-Supervised Classification to Enhance Prediction Performance [2.2736104746143355]
We propose an innovative dynamic classification algorithm designed to achieve the objective of zero missed detections and minimal false positives.
The algorithm partitions the data into N equivalent training subsets and N prediction subsets using a supervised model, followed by independent predictions from N separate predictive models.
Experimental results demonstrate that, when data partitioning errors are minimal, the dynamic classification algorithm achieves exceptional performance with zero missed detections and minimal false positives.
arXiv Detail & Related papers (2025-02-26T07:11:12Z) - A Mirror Descent-Based Algorithm for Corruption-Tolerant Distributed Gradient Descent [57.64826450787237]
We show how to analyze the behavior of distributed gradient descent algorithms in the presence of adversarial corruptions.
We show how to use ideas from (lazy) mirror descent to design a corruption-tolerant distributed optimization algorithm.
Experiments based on linear regression, support vector classification, and softmax classification on the MNIST dataset corroborate our theoretical findings.
arXiv Detail & Related papers (2024-07-19T08:29:12Z) - Adaptive Optimization for Prediction with Missing Data [6.800113478497425]
We show that some adaptive linear regression models are equivalent to learning an imputation rule and a downstream linear regression model simultaneously.
In settings where data is strongly not missing at random, our methods achieve a 2-10% improvement in out-of-sample accuracy.
arXiv Detail & Related papers (2024-02-02T16:35:51Z) - Continual learning for surface defect segmentation by subnetwork
creation and selection [55.2480439325792]
We introduce a new continual (or lifelong) learning algorithm that performs segmentation tasks without undergoing catastrophic forgetting.
The method is applied to two different surface defect segmentation problems that are learned incrementally.
Our approach shows comparable results with joint training when all the training data (all defects) are seen simultaneously.
arXiv Detail & Related papers (2023-12-08T15:28:50Z) - Efficient Grammatical Error Correction Via Multi-Task Training and
Optimized Training Schedule [55.08778142798106]
We propose auxiliary tasks that exploit the alignment between the original and corrected sentences.
We formulate each task as a sequence-to-sequence problem and perform multi-task training.
We find that the order of datasets used for training and even individual instances within a dataset may have important effects on the final performance.
arXiv Detail & Related papers (2023-11-20T14:50:12Z) - Machine Learning Based Missing Values Imputation in Categorical Datasets [2.5611256859404983]
This research looked into the use of machine learning algorithms to fill in the gaps in categorical datasets.
The emphasis was on ensemble models constructed using the Error Correction Output Codes framework.
Deep learning for missing data imputation has obstacles despite these encouraging results, including the requirement for large amounts of labeled data.
arXiv Detail & Related papers (2023-06-10T03:29:48Z) - Informative regularization for a multi-layer perceptron RR Lyrae
classifier under data shift [3.303002683812084]
We propose a scalable and easily adaptable approach based on an informative regularization and an ad-hoc training procedure to mitigate the shift problem.
Our method provides a new path to incorporate knowledge from characteristic features into artificial neural networks to manage the underlying data shift problem.
arXiv Detail & Related papers (2023-03-12T02:49:19Z) - A novel approach for wafer defect pattern classification based on
topological data analysis [0.0]
In semiconductor manufacturing, wafer map defect pattern provides critical information for facility maintenance and yield management.
We propose a novel way to represent the shape of the defect pattern as a finite-dimensional vector, which will be used as an input for a neural network algorithm for classification.
arXiv Detail & Related papers (2022-09-19T11:54:13Z) - Modular Conformal Calibration [80.33410096908872]
We introduce a versatile class of algorithms for recalibration in regression.
This framework allows one to transform any regression model into a calibrated probabilistic model.
We conduct an empirical study of MCC on 17 regression datasets.
arXiv Detail & Related papers (2022-06-23T03:25:23Z) - Regularized Classification-Aware Quantization [39.04839665081476]
We present a class of algorithms that learn distributed quantization schemes for binary classification tasks.
Our method is called Regularized Classification-Aware Quantization.
arXiv Detail & Related papers (2021-07-12T21:27:48Z) - Evaluating State-of-the-Art Classification Models Against Bayes
Optimality [106.50867011164584]
We show that we can compute the exact Bayes error of generative models learned using normalizing flows.
We use our approach to conduct a thorough investigation of state-of-the-art classification models.
arXiv Detail & Related papers (2021-06-07T06:21:20Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.