Restricted Generative Projection for One-Class Classification and
Anomaly Detection
- URL: http://arxiv.org/abs/2307.04097v1
- Date: Sun, 9 Jul 2023 04:59:10 GMT
- Title: Restricted Generative Projection for One-Class Classification and
Anomaly Detection
- Authors: Feng Xiao, Ruoyu Sun, Jicong Fan
- Abstract summary: We learn a mapping to transform the unknown distribution of training (normal) data to a known target distribution.
The simplicity is to ensure that we can sample from the distribution easily.
The compactness is to ensure that the decision boundary between normal data and abnormal data is clear.
The informativeness is to ensure that the transformed data preserve the important information of the original data.
- Score: 31.173234437065464
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a simple framework for one-class classification and anomaly
detection. The core idea is to learn a mapping to transform the unknown
distribution of training (normal) data to a known target distribution.
Crucially, the target distribution should be sufficiently simple, compact, and
informative. The simplicity is to ensure that we can sample from the
distribution easily, the compactness is to ensure that the decision boundary
between normal data and abnormal data is clear and reliable, and the
informativeness is to ensure that the transformed data preserve the important
information of the original data. Therefore, we propose to use truncated
Gaussian, uniform in hypersphere, uniform on hypersphere, or uniform between
hyperspheres, as the target distribution. We then minimize the distance between
the transformed data distribution and the target distribution while keeping the
reconstruction error for the original data small enough. Comparative studies on
multiple benchmark datasets verify the effectiveness of our methods in
comparison to baselines.
Related papers
- Theory on Score-Mismatched Diffusion Models and Zero-Shot Conditional Samplers [49.97755400231656]
We present the first performance guarantee with explicit dimensional general score-mismatched diffusion samplers.
We show that score mismatches result in an distributional bias between the target and sampling distributions, proportional to the accumulated mismatch between the target and training distributions.
This result can be directly applied to zero-shot conditional samplers for any conditional model, irrespective of measurement noise.
arXiv Detail & Related papers (2024-10-17T16:42:12Z) - Beyond Discrepancy: A Closer Look at the Theory of Distribution Shift [27.99789694038377]
This work takes a closer look at the theory of distribution shift for a classifier from a source to a target distribution.
We show when only unlabeled data from the target is sufficient, and when labeled target data is needed.
In all cases, we provide rigorous theoretical guarantees in the large sample regime.
arXiv Detail & Related papers (2024-05-29T15:00:19Z) - Collaborative Heterogeneous Causal Inference Beyond Meta-analysis [68.4474531911361]
We propose a collaborative inverse propensity score estimator for causal inference with heterogeneous data.
Our method shows significant improvements over the methods based on meta-analysis when heterogeneity increases.
arXiv Detail & Related papers (2024-04-24T09:04:36Z) - Beyond the Known: Adversarial Autoencoders in Novelty Detection [2.7486022583843233]
In novelty detection, the goal is to decide if a new data point should be categorized as an inlier or an outlier.
We use a similar framework but with a lightweight deep network, and we adopt a probabilistic score with reconstruction error.
Our results indicate that our approach is effective at learning the target class, and it outperforms recent state-of-the-art methods on several benchmark datasets.
arXiv Detail & Related papers (2024-04-06T00:04:19Z) - Probabilistic Matching of Real and Generated Data Statistics in Generative Adversarial Networks [0.6906005491572401]
We propose a method to ensure that the distributions of certain generated data statistics coincide with the respective distributions of the real data.
We evaluate the method on a synthetic dataset and a real-world dataset and demonstrate improved performance of our approach.
arXiv Detail & Related papers (2023-06-19T14:03:27Z) - Certifying Model Accuracy under Distribution Shifts [151.67113334248464]
We present provable robustness guarantees on the accuracy of a model under bounded Wasserstein shifts of the data distribution.
We show that a simple procedure that randomizes the input of the model within a transformation space is provably robust to distributional shifts under the transformation.
arXiv Detail & Related papers (2022-01-28T22:03:50Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Dealing with Distribution Mismatch in Semi-supervised Deep Learning for
Covid-19 Detection Using Chest X-ray Images: A Novel Approach Using Feature
Densities [0.6882042556551609]
Semi-supervised deep learning is an attractive alternative to large labelled datasets.
In real-world usage settings, an unlabelled dataset might present a different distribution than the labelled dataset.
This results in a distribution mismatch between the unlabelled and labelled datasets.
arXiv Detail & Related papers (2021-08-17T00:35:43Z) - KL Guided Domain Adaptation [88.19298405363452]
Domain adaptation is an important problem and often needed for real-world applications.
A common approach in the domain adaptation literature is to learn a representation of the input that has the same distributions over the source and the target domain.
We show that with a probabilistic representation network, the KL term can be estimated efficiently via minibatch samples.
arXiv Detail & Related papers (2021-06-14T22:24:23Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.