Related papers: Data structure > labels? Unsupervised heuristics for SVM hyperparameter estimation

Data structure > labels? Unsupervised heuristics for SVM hyperparameter estimation

URL: http://arxiv.org/abs/2111.02164v2
Date: Thu, 22 Feb 2024 08:05:10 GMT
Title: Data structure > labels? Unsupervised heuristics for SVM hyperparameter estimation
Authors: Micha{\l} Cholewa, Micha{\l} Romaszewski, Przemys{\l}aw G{\l}omb
Abstract summary: Support Vector Machine is a de-facto reference for many Machine Learning approaches. parameter selection is usually achieved by a time-consuming grid search cross-validation procedure (GSCV) We have proposed improveds for SVM parameter selection and tested it against GSCV and state of the arts on over 30 standard classification datasets.
Score: 0.9208007322096532
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Classification is one of the main areas of pattern recognition research, and within it, Support Vector Machine (SVM) is one of the most popular methods outside of field of deep learning -- and a de-facto reference for many Machine Learning approaches. Its performance is determined by parameter selection, which is usually achieved by a time-consuming grid search cross-validation procedure (GSCV). That method, however relies on the availability and quality of labelled examples and thus, when those are limited can be hindered. To address that problem, there exist several unsupervised heuristics that take advantage of the characteristics of the dataset for selecting parameters instead of using class label information. While an order of magnitude faster, they are scarcely used under the assumption that their results are significantly worse than those of grid search. To challenge that assumption, we have proposed improved heuristics for SVM parameter selection and tested it against GSCV and state of the art heuristics on over 30 standard classification datasets. The results show not only its advantage over state-of-art heuristics but also that it is statistically no worse than GSCV.

Related papers

Cost-sensitive probabilistic predictions for support vector machines [1.743685428161914]
Support vector machines (SVMs) are widely used and constitute one of the best examined and used machine learning models. We propose a novel approach to generate probabilistic outputs for the SVM.
arXiv Detail & Related papers (2023-10-09T11:00:17Z)
Dynamic Conceptional Contrastive Learning for Generalized Category Discovery [76.82327473338734]
Generalized category discovery (GCD) aims to automatically cluster partially labeled data. Unlabeled data contain instances that are not only from known categories of the labeled data but also from novel categories. One effective way for GCD is applying self-supervised learning to learn discriminate representation for unlabeled data. We propose a Dynamic Conceptional Contrastive Learning framework, which can effectively improve clustering accuracy.
arXiv Detail & Related papers (2023-03-30T14:04:39Z)
Parametric Classification for Generalized Category Discovery: A Baseline Study [70.73212959385387]
Generalized Category Discovery (GCD) aims to discover novel categories in unlabelled datasets using knowledge learned from labelled samples. We investigate the failure of parametric classifiers, verify the effectiveness of previous design choices when high-quality supervision is available, and identify unreliable pseudo-labels as a key problem. We propose a simple yet effective parametric classification method that benefits from entropy regularisation, achieves state-of-the-art performance on multiple GCD benchmarks and shows strong robustness to unknown class numbers.
arXiv Detail & Related papers (2022-11-21T18:47:11Z)
Primal Estimated Subgradient Solver for SVM for Imbalanced Classification [0.0]
We aim to demonstrate that our cost sensitive PEGASOS SVM achieves good performance on imbalanced data sets with a Majority to Minority Ratio ranging from 8.6:1 to 130:1. We evaluate the performance by examining the learning curves. We benchmark our PEGASOS Cost-Sensitive SVM's results of Ding's LINEAR SVM DECIDL method.
arXiv Detail & Related papers (2022-06-19T02:33:14Z)
On the Eigenvalues of Global Covariance Pooling for Fine-grained Visual Recognition [65.67315418971688]
We show that truncating small eigenvalues of the Global Covariance Pooling (GCP) can attain smoother gradient. On fine-grained datasets, truncating the small eigenvalues would make the model fail to converge. Inspired by this observation, we propose a network branch dedicated to magnifying the importance of small eigenvalues.
arXiv Detail & Related papers (2022-05-26T11:41:36Z)
CvS: Classification via Segmentation For Small Datasets [52.821178654631254]
This paper presents CvS, a cost-effective classifier for small datasets that derives the classification labels from predicting the segmentation maps. We evaluate the effectiveness of our framework on diverse problems showing that CvS is able to achieve much higher classification results compared to previous methods when given only a handful of examples.
arXiv Detail & Related papers (2021-10-29T18:41:15Z)
Classifications based on response times for detecting early-stage Alzheimer's disease [0.0]
This paper mainly describes a way to detect with high accuracy patients with early-stage Alzheimer's disease (ES-AD) versus healthy control (HC) subjects. The solution presented in this paper makes two or even four times fewer errors than the best results of the state of the art concerning the classification HC/ES-AD from handwriting and drawing tasks.
arXiv Detail & Related papers (2021-02-01T10:08:08Z)
Average Localised Proximity: a new data descriptor with good default one-class classification performance [4.894976692426517]
One-class classification is a challenging subfield of machine learning. Data descriptors are used to predict membership of a class based solely on positive examples of that class.
arXiv Detail & Related papers (2021-01-26T19:14:14Z)
Minimax Active Learning [61.729667575374606]
Active learning aims to develop label-efficient algorithms by querying the most representative samples to be labeled by a human annotator. Current active learning techniques either rely on model uncertainty to select the most uncertain samples or use clustering or reconstruction to choose the most diverse set of unlabeled examples. We develop a semi-supervised minimax entropy-based active learning algorithm that leverages both uncertainty and diversity in an adversarial manner.
arXiv Detail & Related papers (2020-12-18T19:03:40Z)
Solving Long-tailed Recognition with Deep Realistic Taxonomic Classifier [68.38233199030908]
Long-tail recognition tackles the natural non-uniformly distributed data in realworld scenarios. While moderns perform well on populated classes, its performance degrades significantly on tail classes. Deep-RTC is proposed as a new solution to the long-tail problem, combining realism with hierarchical predictions.
arXiv Detail & Related papers (2020-07-20T05:57:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.