Primal Estimated Subgradient Solver for SVM for Imbalanced
Classification
- URL: http://arxiv.org/abs/2206.09311v6
- Date: Thu, 9 Nov 2023 19:05:57 GMT
- Title: Primal Estimated Subgradient Solver for SVM for Imbalanced
Classification
- Authors: John Sun
- Abstract summary: We aim to demonstrate that our cost sensitive PEGASOS SVM achieves good performance on imbalanced data sets with a Majority to Minority Ratio ranging from 8.6:1 to 130:1.
We evaluate the performance by examining the learning curves.
We benchmark our PEGASOS Cost-Sensitive SVM's results of Ding's LINEAR SVM DECIDL method.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We aim to demonstrate in experiments that our cost sensitive PEGASOS SVM
achieves good performance on imbalanced data sets with a Majority to Minority
Ratio ranging from 8.6:1 to 130:1 and to ascertain whether the including
intercept (bias), regularization and parameters affects performance on our
selection of datasets. Although many resort to SMOTE methods, we aim for a less
computationally intensive method. We evaluate the performance by examining the
learning curves. These curves diagnose whether we overfit or underfit or
whether the random sample of data chosen during the process was not random
enough or diverse enough in dependent variable class for the algorithm to
generalized to unseen examples. We will also see the background of the
hyperparameters versus the test and train error in validation curves. We
benchmark our PEGASOS Cost-Sensitive SVM's results of Ding's LINEAR SVM DECIDL
method. He obtained an ROC-AUC of .5 in one dataset. Our work will extend the
work of Ding by incorporating kernels into SVM. We will use Python rather than
MATLAB as python has dictionaries for storing mixed data types during
multi-parameter cross-validation.
Related papers
- Training on the Benchmark Is Not All You Need [52.01920740114261]
We propose a simple and effective data leakage detection method based on the contents of multiple-choice options.
Our method is able to work under black-box conditions without access to model training data or weights.
We evaluate the degree of data leakage of 31 mainstream open-source LLMs on four benchmark datasets.
arXiv Detail & Related papers (2024-09-03T11:09:44Z) - Robust Meta-learning with Sampling Noise and Label Noise via
Eigen-Reptile [78.1212767880785]
meta-learner is prone to overfitting since there are only a few available samples.
When handling the data with noisy labels, the meta-learner could be extremely sensitive to label noise.
We present Eigen-Reptile (ER) that updates the meta- parameters with the main direction of historical task-specific parameters.
arXiv Detail & Related papers (2022-06-04T08:48:02Z) - Ensemble Methods for Robust Support Vector Machines using Integer
Programming [0.0]
We study binary classification problems where we assume that our training data is subject to uncertainty.
To tackle this issue in the field of robust machine learning the aim is to develop models which are robust against small perturbations in the training data.
arXiv Detail & Related papers (2022-03-03T10:03:54Z) - A Lagrangian Duality Approach to Active Learning [119.36233726867992]
We consider the batch active learning problem, where only a subset of the training data is labeled.
We formulate the learning problem using constrained optimization, where each constraint bounds the performance of the model on labeled samples.
We show, via numerical experiments, that our proposed approach performs similarly to or better than state-of-the-art active learning methods.
arXiv Detail & Related papers (2022-02-08T19:18:49Z) - Data structure > labels? Unsupervised heuristics for SVM hyperparameter
estimation [0.9208007322096532]
Support Vector Machine is a de-facto reference for many Machine Learning approaches.
parameter selection is usually achieved by a time-consuming grid search cross-validation procedure (GSCV)
We have proposed improveds for SVM parameter selection and tested it against GSCV and state of the arts on over 30 standard classification datasets.
arXiv Detail & Related papers (2021-11-03T12:04:03Z) - GMOTE: Gaussian based minority oversampling technique for imbalanced
classification adapting tail probability of outliers [0.0]
Data-level approaches mainly use the oversampling methods to solve the problem, such as synthetic minority oversampling Technique (SMOTE)
In this paper, we proposed Gaussian based minority oversampling technique (GMOTE) with a statistical perspective for imbalanced datasets.
When the GMOTE is combined with classification and regression tree (CART) or support vector machine (SVM), it shows better accuracy and F1-Score.
arXiv Detail & Related papers (2021-05-09T07:04:37Z) - Accounting for Variance in Machine Learning Benchmarks [37.922783300635864]
One machine-learning algorithm A outperforms another one B ideally calls for multiple trials optimizing the learning pipeline over sources of variation.
This is prohibitively expensive, and corners are cut to reach conclusions.
We model the whole benchmarking process, revealing that variance due to data sampling, parameter initialization and hyper parameter choice impact markedly the results.
We show a counter-intuitive result that adding more sources of variation to an imperfect estimator approaches better the ideal estimator at a 51 times reduction in compute cost.
arXiv Detail & Related papers (2021-03-01T22:39:49Z) - Estimating Average Treatment Effects with Support Vector Machines [77.34726150561087]
Support vector machine (SVM) is one of the most popular classification algorithms in the machine learning literature.
We adapt SVM as a kernel-based weighting procedure that minimizes the maximum mean discrepancy between the treatment and control groups.
We characterize the bias of causal effect estimation arising from this trade-off, connecting the proposed SVM procedure to the existing kernel balancing methods.
arXiv Detail & Related papers (2021-02-23T20:22:56Z) - Attentional-Biased Stochastic Gradient Descent [74.49926199036481]
We present a provable method (named ABSGD) for addressing the data imbalance or label noise problem in deep learning.
Our method is a simple modification to momentum SGD where we assign an individual importance weight to each sample in the mini-batch.
ABSGD is flexible enough to combine with other robust losses without any additional cost.
arXiv Detail & Related papers (2020-12-13T03:41:52Z) - AML-SVM: Adaptive Multilevel Learning with Support Vector Machines [0.0]
This paper proposes an adaptive multilevel learning framework for the nonlinear SVM.
It improves the classification quality across the refinement process, and leverages multi-threaded parallel processing for better performance.
arXiv Detail & Related papers (2020-11-05T00:17:02Z) - Least Squares Regression with Markovian Data: Fundamental Limits and
Algorithms [69.45237691598774]
We study the problem of least squares linear regression where the data-points are dependent and are sampled from a Markov chain.
We establish sharp information theoretic minimax lower bounds for this problem in terms of $tau_mathsfmix$.
We propose an algorithm based on experience replay--a popular reinforcement learning technique--that achieves a significantly better error rate.
arXiv Detail & Related papers (2020-06-16T04:26:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.