Data structure > labels? Unsupervised heuristics for SVM hyperparameter
estimation
- URL: http://arxiv.org/abs/2111.02164v2
- Date: Thu, 22 Feb 2024 08:05:10 GMT
- Title: Data structure > labels? Unsupervised heuristics for SVM hyperparameter
estimation
- Authors: Micha{\l} Cholewa, Micha{\l} Romaszewski, Przemys{\l}aw G{\l}omb
- Abstract summary: Support Vector Machine is a de-facto reference for many Machine Learning approaches.
parameter selection is usually achieved by a time-consuming grid search cross-validation procedure (GSCV)
We have proposed improveds for SVM parameter selection and tested it against GSCV and state of the arts on over 30 standard classification datasets.
- Score: 0.9208007322096532
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Classification is one of the main areas of pattern recognition research, and
within it, Support Vector Machine (SVM) is one of the most popular methods
outside of field of deep learning -- and a de-facto reference for many Machine
Learning approaches. Its performance is determined by parameter selection,
which is usually achieved by a time-consuming grid search cross-validation
procedure (GSCV). That method, however relies on the availability and quality
of labelled examples and thus, when those are limited can be hindered. To
address that problem, there exist several unsupervised heuristics that take
advantage of the characteristics of the dataset for selecting parameters
instead of using class label information. While an order of magnitude faster,
they are scarcely used under the assumption that their results are
significantly worse than those of grid search. To challenge that assumption, we
have proposed improved heuristics for SVM parameter selection and tested it
against GSCV and state of the art heuristics on over 30 standard classification
datasets. The results show not only its advantage over state-of-art heuristics
but also that it is statistically no worse than GSCV.
Related papers
- Cost-sensitive probabilistic predictions for support vector machines [1.743685428161914]
Support vector machines (SVMs) are widely used and constitute one of the best examined and used machine learning models.
We propose a novel approach to generate probabilistic outputs for the SVM.
arXiv Detail & Related papers (2023-10-09T11:00:17Z) - Dynamic Conceptional Contrastive Learning for Generalized Category
Discovery [76.82327473338734]
Generalized category discovery (GCD) aims to automatically cluster partially labeled data.
Unlabeled data contain instances that are not only from known categories of the labeled data but also from novel categories.
One effective way for GCD is applying self-supervised learning to learn discriminate representation for unlabeled data.
We propose a Dynamic Conceptional Contrastive Learning framework, which can effectively improve clustering accuracy.
arXiv Detail & Related papers (2023-03-30T14:04:39Z) - Parametric Classification for Generalized Category Discovery: A Baseline
Study [70.73212959385387]
Generalized Category Discovery (GCD) aims to discover novel categories in unlabelled datasets using knowledge learned from labelled samples.
We investigate the failure of parametric classifiers, verify the effectiveness of previous design choices when high-quality supervision is available, and identify unreliable pseudo-labels as a key problem.
We propose a simple yet effective parametric classification method that benefits from entropy regularisation, achieves state-of-the-art performance on multiple GCD benchmarks and shows strong robustness to unknown class numbers.
arXiv Detail & Related papers (2022-11-21T18:47:11Z) - Primal Estimated Subgradient Solver for SVM for Imbalanced
Classification [0.0]
We aim to demonstrate that our cost sensitive PEGASOS SVM achieves good performance on imbalanced data sets with a Majority to Minority Ratio ranging from 8.6:1 to 130:1.
We evaluate the performance by examining the learning curves.
We benchmark our PEGASOS Cost-Sensitive SVM's results of Ding's LINEAR SVM DECIDL method.
arXiv Detail & Related papers (2022-06-19T02:33:14Z) - On the Eigenvalues of Global Covariance Pooling for Fine-grained Visual
Recognition [65.67315418971688]
We show that truncating small eigenvalues of the Global Covariance Pooling (GCP) can attain smoother gradient.
On fine-grained datasets, truncating the small eigenvalues would make the model fail to converge.
Inspired by this observation, we propose a network branch dedicated to magnifying the importance of small eigenvalues.
arXiv Detail & Related papers (2022-05-26T11:41:36Z) - CvS: Classification via Segmentation For Small Datasets [52.821178654631254]
This paper presents CvS, a cost-effective classifier for small datasets that derives the classification labels from predicting the segmentation maps.
We evaluate the effectiveness of our framework on diverse problems showing that CvS is able to achieve much higher classification results compared to previous methods when given only a handful of examples.
arXiv Detail & Related papers (2021-10-29T18:41:15Z) - Classifications based on response times for detecting early-stage
Alzheimer's disease [0.0]
This paper mainly describes a way to detect with high accuracy patients with early-stage Alzheimer's disease (ES-AD) versus healthy control (HC) subjects.
The solution presented in this paper makes two or even four times fewer errors than the best results of the state of the art concerning the classification HC/ES-AD from handwriting and drawing tasks.
arXiv Detail & Related papers (2021-02-01T10:08:08Z) - Average Localised Proximity: a new data descriptor with good default
one-class classification performance [4.894976692426517]
One-class classification is a challenging subfield of machine learning.
Data descriptors are used to predict membership of a class based solely on positive examples of that class.
arXiv Detail & Related papers (2021-01-26T19:14:14Z) - Minimax Active Learning [61.729667575374606]
Active learning aims to develop label-efficient algorithms by querying the most representative samples to be labeled by a human annotator.
Current active learning techniques either rely on model uncertainty to select the most uncertain samples or use clustering or reconstruction to choose the most diverse set of unlabeled examples.
We develop a semi-supervised minimax entropy-based active learning algorithm that leverages both uncertainty and diversity in an adversarial manner.
arXiv Detail & Related papers (2020-12-18T19:03:40Z) - Solving Long-tailed Recognition with Deep Realistic Taxonomic Classifier [68.38233199030908]
Long-tail recognition tackles the natural non-uniformly distributed data in realworld scenarios.
While moderns perform well on populated classes, its performance degrades significantly on tail classes.
Deep-RTC is proposed as a new solution to the long-tail problem, combining realism with hierarchical predictions.
arXiv Detail & Related papers (2020-07-20T05:57:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.