Efficient SVDD Sampling with Approximation Guarantees for the Decision
Boundary
- URL: http://arxiv.org/abs/2009.13853v1
- Date: Tue, 29 Sep 2020 08:28:01 GMT
- Title: Efficient SVDD Sampling with Approximation Guarantees for the Decision
Boundary
- Authors: Adrian Englhardt, Holger Trittenbach, Daniel Kottke, Bernhard Sick,
and Klemens B\"ohm
- Abstract summary: Support Vector Data Description (SVDD) is a popular one-class classifier for anomaly and novelty detection.
Despite its effectiveness, SVDD does not scale well with data size.
In this article, we study how to select a sample considering these points.
Our approach is to frame SVDD sampling as an optimization problem, where constraints guarantee that sampling indeed approximates the original decision boundary.
- Score: 7.251418581794502
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Support Vector Data Description (SVDD) is a popular one-class classifiers for
anomaly and novelty detection. But despite its effectiveness, SVDD does not
scale well with data size. To avoid prohibitive training times, sampling
methods select small subsets of the training data on which SVDD trains a
decision boundary hopefully equivalent to the one obtained on the full data
set. According to the literature, a good sample should therefore contain
so-called boundary observations that SVDD would select as support vectors on
the full data set. However, non-boundary observations also are essential to not
fragment contiguous inlier regions and avoid poor classification accuracy.
Other aspects, such as selecting a sufficiently representative sample, are
important as well. But existing sampling methods largely overlook them,
resulting in poor classification accuracy. In this article, we study how to
select a sample considering these points. Our approach is to frame SVDD
sampling as an optimization problem, where constraints guarantee that sampling
indeed approximates the original decision boundary. We then propose RAPID, an
efficient algorithm to solve this optimization problem. RAPID does not require
any tuning of parameters, is easy to implement and scales well to large data
sets. We evaluate our approach on real-world and synthetic data. Our evaluation
is the most comprehensive one for SVDD sampling so far. Our results show that
RAPID outperforms its competitors in classification accuracy, in sample size,
and in runtime.
Related papers
- Deep Active Learning with Manifold-preserving Trajectory Sampling [2.0717982775472206]
Active learning (AL) is for optimizing the selection of unlabeled data for annotation (labeling)
Existing deep AL methods arguably suffer from bias incurred by clabeled data, which takes a much lower percentage than unlabeled data in AL context.
We propose a novel method, namely Manifold-Preserving Trajectory Sampling (MPTS), aiming to enforce the feature space learned from labeled data to represent a more accurate manifold.
arXiv Detail & Related papers (2024-10-21T03:04:09Z) - BWS: Best Window Selection Based on Sample Scores for Data Pruning across Broad Ranges [12.248397169100784]
Data subset selection aims to find a smaller yet informative subset of a large dataset that can approximate the full-dataset training.
We introduce a universal and efficient data subset selection method, Best Window Selection (BWS), by proposing a method to choose the best window subset from samples ordered based on their difficulty scores.
arXiv Detail & Related papers (2024-06-05T08:33:09Z) - Efficient Hybrid Oversampling and Intelligent Undersampling for
Imbalanced Big Data Classification [1.03590082373586]
We present a novel resampling method called SMOTENN that combines intelligent undersampling and oversampling using a MapReduce framework.
Our experimental results show the virtues of this approach, outperforming alternative resampling techniques for small- and medium-sized datasets.
arXiv Detail & Related papers (2023-10-09T15:22:13Z) - AdaNPC: Exploring Non-Parametric Classifier for Test-Time Adaptation [64.9230895853942]
Domain generalization can be arbitrarily hard without exploiting target domain information.
Test-time adaptive (TTA) methods are proposed to address this issue.
In this work, we adopt Non-Parametric to perform the test-time Adaptation (AdaNPC)
arXiv Detail & Related papers (2023-04-25T04:23:13Z) - A sub-sampling algorithm preventing outliers [0.0]
We propose an unsupervised exchange procedure that enables us to select a nearly D-optimal subset of observations without high leverage points.
We also provide a supervised version of this exchange procedure, where besides high leverage points also the outliers in the responses are avoided.
Both the unsupervised and the supervised selection procedures are generalized to I-optimality, with the goal of getting accurate predictions.
arXiv Detail & Related papers (2022-08-12T11:03:57Z) - Adaptive Sketches for Robust Regression with Importance Sampling [64.75899469557272]
We introduce data structures for solving robust regression through gradient descent (SGD)
Our algorithm effectively runs $T$ steps of SGD with importance sampling while using sublinear space and just making a single pass over the data.
arXiv Detail & Related papers (2022-07-16T03:09:30Z) - Pareto Optimization for Active Learning under Out-of-Distribution Data
Scenarios [79.02009938011447]
We propose a sampling scheme, which selects optimal subsets of unlabeled samples with fixed batch size from the unlabeled data pool.
Experimental results show its effectiveness on both classical Machine Learning (ML) and Deep Learning (DL) tasks.
arXiv Detail & Related papers (2022-07-04T04:11:44Z) - AdAUC: End-to-end Adversarial AUC Optimization Against Long-tail
Problems [102.95119281306893]
We present an early trial to explore adversarial training methods to optimize AUC.
We reformulate the AUC optimization problem as a saddle point problem, where the objective becomes an instance-wise function.
Our analysis differs from the existing studies since the algorithm is asked to generate adversarial examples by calculating the gradient of a min-max problem.
arXiv Detail & Related papers (2022-06-24T09:13:39Z) - A Robust Optimization Method for Label Noisy Datasets Based on Adaptive
Threshold: Adaptive-k [0.0]
SGD does not produce robust results on datasets with label noise.
In this paper, we recommend using samples with loss less than a threshold value determined during the optimization process, instead of using all samples in the mini-batch.
Our proposed method, Adaptive-k, aims to exclude label noise samples from the optimization process and make the process robust.
arXiv Detail & Related papers (2022-03-26T21:48:12Z) - Dash: Semi-Supervised Learning with Dynamic Thresholding [72.74339790209531]
We propose a semi-supervised learning (SSL) approach that uses unlabeled examples to train models.
Our proposed approach, Dash, enjoys its adaptivity in terms of unlabeled data selection.
arXiv Detail & Related papers (2021-09-01T23:52:29Z) - Tune it the Right Way: Unsupervised Validation of Domain Adaptation via
Soft Neighborhood Density [125.64297244986552]
We propose an unsupervised validation criterion that measures the density of soft neighborhoods by computing the entropy of the similarity distribution between points.
Our criterion is simpler than competing validation methods, yet more effective.
arXiv Detail & Related papers (2021-08-24T17:41:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.