Average Localised Proximity: a new data descriptor with good default
one-class classification performance
- URL: http://arxiv.org/abs/2101.11037v1
- Date: Tue, 26 Jan 2021 19:14:14 GMT
- Title: Average Localised Proximity: a new data descriptor with good default
one-class classification performance
- Authors: Oliver Urs Lenz, Daniel Peralta, Chris Cornelis
- Abstract summary: One-class classification is a challenging subfield of machine learning.
Data descriptors are used to predict membership of a class based solely on positive examples of that class.
- Score: 4.894976692426517
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: One-class classification is a challenging subfield of machine learning in
which so-called data descriptors are used to predict membership of a class
based solely on positive examples of that class, and no counter-examples. A
number of data descriptors that have been shown to perform well in previous
studies of one-class classification, like the Support Vector Machine (SVM),
require setting one or more hyperparameters. There has been no systematic
attempt to date to determine optimal default values for these hyperparameters,
which limits their ease of use, especially in comparison with
hyperparameter-free proposals like the Isolation Forest (IF). We address this
issue by determining optimal default hyperparameter values across a collection
of 246 one-class classification problems derived from 50 different real-world
datasets. In addition, we propose a new data descriptor, Average Localised
Proximity (ALP) to address certain issues with existing approaches based on
nearest neighbour distances. Finally, we evaluate classification performance
using a leave-one-dataset-out procedure, and find strong evidence that ALP
outperforms IF and a number of other data descriptors, as well as weak evidence
that it outperforms SVM, making ALP a good default choice.
Related papers
- An incremental preference elicitation-based approach to learning potentially non-monotonic preferences in multi-criteria sorting [53.36437745983783]
We first construct a max-margin optimization-based model to model potentially non-monotonic preferences.
We devise information amount measurement methods and question selection strategies to pinpoint the most informative alternative in each iteration.
Two incremental preference elicitation-based algorithms are developed to learn potentially non-monotonic preferences.
arXiv Detail & Related papers (2024-09-04T14:36:20Z) - Minimally Supervised Learning using Topological Projections in
Self-Organizing Maps [55.31182147885694]
We introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs)
Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU)
Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques.
arXiv Detail & Related papers (2024-01-12T22:51:48Z) - Cost-sensitive probabilistic predictions for support vector machines [1.743685428161914]
Support vector machines (SVMs) are widely used and constitute one of the best examined and used machine learning models.
We propose a novel approach to generate probabilistic outputs for the SVM.
arXiv Detail & Related papers (2023-10-09T11:00:17Z) - Parametric Classification for Generalized Category Discovery: A Baseline
Study [70.73212959385387]
Generalized Category Discovery (GCD) aims to discover novel categories in unlabelled datasets using knowledge learned from labelled samples.
We investigate the failure of parametric classifiers, verify the effectiveness of previous design choices when high-quality supervision is available, and identify unreliable pseudo-labels as a key problem.
We propose a simple yet effective parametric classification method that benefits from entropy regularisation, achieves state-of-the-art performance on multiple GCD benchmarks and shows strong robustness to unknown class numbers.
arXiv Detail & Related papers (2022-11-21T18:47:11Z) - Self-Adaptive Label Augmentation for Semi-supervised Few-shot
Classification [121.63992191386502]
Few-shot classification aims to learn a model that can generalize well to new tasks when only a few labeled samples are available.
We propose a semi-supervised few-shot classification method that assigns an appropriate label to each unlabeled sample by a manually defined metric.
A major novelty of SALA is the task-adaptive metric, which can learn the metric adaptively for different tasks in an end-to-end fashion.
arXiv Detail & Related papers (2022-06-16T13:14:03Z) - Decision Making for Hierarchical Multi-label Classification with
Multidimensional Local Precision Rate [4.812468844362369]
We introduce a new statistic called the multidimensional local precision rate (mLPR) for each object in each class.
We show that classification decisions made by simply sorting objects across classes in descending order of their mLPRs can, in theory, ensure the class hierarchy.
In response, we introduce HierRank, a new algorithm that maximizes an empirical version of CATCH using estimated mLPRs while respecting the hierarchy.
arXiv Detail & Related papers (2022-05-16T17:43:35Z) - Data structure > labels? Unsupervised heuristics for SVM hyperparameter
estimation [0.9208007322096532]
Support Vector Machine is a de-facto reference for many Machine Learning approaches.
parameter selection is usually achieved by a time-consuming grid search cross-validation procedure (GSCV)
We have proposed improveds for SVM parameter selection and tested it against GSCV and state of the arts on over 30 standard classification datasets.
arXiv Detail & Related papers (2021-11-03T12:04:03Z) - When in Doubt: Improving Classification Performance with Alternating
Normalization [57.39356691967766]
We introduce Classification with Alternating Normalization (CAN), a non-parametric post-processing step for classification.
CAN improves classification accuracy for challenging examples by re-adjusting their predicted class probability distribution.
We empirically demonstrate its effectiveness across a diverse set of classification tasks.
arXiv Detail & Related papers (2021-09-28T02:55:42Z) - Optimised one-class classification performance [4.894976692426517]
We treat optimisation of three data descriptors: Support Vector Machine (SVM), Nearest Neighbour Distance (NND) and Average Localised Proximity (ALP)
We experimentally evaluate the effect of hyper parameter optimisation with 246 classification problems drawn from 50 datasets.
arXiv Detail & Related papers (2021-02-04T14:08:20Z) - A novel embedded min-max approach for feature selection in nonlinear
support vector machine classification [0.0]
We propose an embedded feature selection method based on a min-max optimization problem.
By leveraging duality theory, we equivalently reformulate the min-max problem and solve it without further ado.
The efficiency and usefulness of our approach are tested on several benchmark data sets.
arXiv Detail & Related papers (2020-04-21T09:40:38Z) - Selecting Relevant Features from a Multi-domain Representation for
Few-shot Classification [91.67977602992657]
We propose a new strategy based on feature selection, which is both simpler and more effective than previous feature adaptation approaches.
We show that a simple non-parametric classifier built on top of such features produces high accuracy and generalizes to domains never seen during training.
arXiv Detail & Related papers (2020-03-20T15:44:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.