robROSE: A robust approach for dealing with imbalanced data in fraud
detection
- URL: http://arxiv.org/abs/2003.11915v1
- Date: Sun, 22 Mar 2020 16:11:07 GMT
- Title: robROSE: A robust approach for dealing with imbalanced data in fraud
detection
- Authors: Bart Baesens, Sebastiaan H\"oppner, Irene Ortner, and Tim Verdonck
- Abstract summary: A major challenge when trying to detect fraud is that the fraudulent activities form a minority class which make up a very small proportion of the data set.
We present a robust version of ROSE, called robROSE, which combines several promising approaches to cope simultaneously with the problem of imbalanced data.
- Score: 2.1734195143282697
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A major challenge when trying to detect fraud is that the fraudulent
activities form a minority class which make up a very small proportion of the
data set. In most data sets, fraud occurs in typically less than 0.5% of the
cases. Detecting fraud in such a highly imbalanced data set typically leads to
predictions that favor the majority group, causing fraud to remain undetected.
We discuss some popular oversampling techniques that solve the problem of
imbalanced data by creating synthetic samples that mimic the minority class. A
frequent problem when analyzing real data is the presence of anomalies or
outliers. When such atypical observations are present in the data, most
oversampling techniques are prone to create synthetic samples that distort the
detection algorithm and spoil the resulting analysis. A useful tool for anomaly
detection is robust statistics, which aims to find the outliers by first
fitting the majority of the data and then flagging data observations that
deviate from it. In this paper, we present a robust version of ROSE, called
robROSE, which combines several promising approaches to cope simultaneously
with the problem of imbalanced data and the presence of outliers. The proposed
method achieves to enhance the presence of the fraud cases while ignoring
anomalies. The good performance of our new sampling technique is illustrated on
simulated and real data sets and it is shown that robROSE can provide better
insight in the structure of the data. The source code of the robROSE algorithm
is made freely available.
Related papers
- Research on Dynamic Data Flow Anomaly Detection based on Machine Learning [11.526496773281938]
In this study, the unsupervised learning method is employed to identify anomalies in dynamic data flows.
By clustering similar data, the model is able to detect data behaviour that deviates significantly from normal traffic without the need for labelled data.
Notably, it demonstrates robust and adaptable performance, particularly in the context of unbalanced data.
arXiv Detail & Related papers (2024-09-23T08:19:15Z) - Anomaly Detection by Context Contrasting [57.695202846009714]
Anomaly detection focuses on identifying samples that deviate from the norm.
Recent advances in self-supervised learning have shown great promise in this regard.
We propose Con$$, which learns through context augmentations.
arXiv Detail & Related papers (2024-05-29T07:59:06Z) - SIRST-5K: Exploring Massive Negatives Synthesis with Self-supervised
Learning for Robust Infrared Small Target Detection [53.19618419772467]
Single-frame infrared small target (SIRST) detection aims to recognize small targets from clutter backgrounds.
With the development of Transformer, the scale of SIRST models is constantly increasing.
With a rich diversity of infrared small target data, our algorithm significantly improves the model performance and convergence speed.
arXiv Detail & Related papers (2024-03-08T16:14:54Z) - Membership Inference Attacks against Synthetic Data through Overfitting
Detection [84.02632160692995]
We argue for a realistic MIA setting that assumes the attacker has some knowledge of the underlying data distribution.
We propose DOMIAS, a density-based MIA model that aims to infer membership by targeting local overfitting of the generative model.
arXiv Detail & Related papers (2023-02-24T11:27:39Z) - Information FOMO: The unhealthy fear of missing out on information. A method for removing misleading data for healthier models [0.0]
Misleading or unnecessary data can have out-sized impacts on the health or accuracy of Machine Learning (ML) models.
We present a sequential selection method that identifies critically important information within a dataset.
We find these instabilities are a result of the complexity of the underlying map and linked to extreme events and heavy tails.
arXiv Detail & Related papers (2022-08-27T19:43:53Z) - Credit card fraud detection - Classifier selection strategy [0.0]
Using a sample of annotated transactions, a machine learning classification algorithm learns to detect frauds.
fraud data sets are diverse and exhibit inconsistent characteristics.
We propose a data-driven classifier selection strategy for characteristic highly imbalanced fraud detection data sets.
arXiv Detail & Related papers (2022-08-25T07:13:42Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Efficient remedies for outlier detection with variational autoencoders [8.80692072928023]
Likelihoods computed by deep generative models are a candidate metric for outlier detection with unlabeled data.
We show that a theoretically-grounded correction readily ameliorates a key bias with VAE likelihood estimates.
We also show that the variance of the likelihoods computed over an ensemble of VAEs also enables robust outlier detection.
arXiv Detail & Related papers (2021-08-19T16:00:58Z) - Explainable Deep Few-shot Anomaly Detection with Deviation Networks [123.46611927225963]
We introduce a novel weakly-supervised anomaly detection framework to train detection models.
The proposed approach learns discriminative normality by leveraging the labeled anomalies and a prior probability.
Our model is substantially more sample-efficient and robust, and performs significantly better than state-of-the-art competing methods in both closed-set and open-set settings.
arXiv Detail & Related papers (2021-08-01T14:33:17Z) - Continual Learning for Fake Audio Detection [62.54860236190694]
This paper proposes detecting fake without forgetting, a continual-learning-based method, to make the model learn new spoofing attacks incrementally.
Experiments are conducted on the ASVspoof 2019 dataset.
arXiv Detail & Related papers (2021-04-15T07:57:05Z) - A Novel Resampling Technique for Imbalanced Dataset Optimization [1.0323063834827415]
classification of rare events is a common problem in many domains, such as fraudulent transactions, malware traffic analysis and network intrusion detection.
We develop two versions of Generative Silhouette Resampling 1-Nearest Neighbour (G1Nos) oversampling algorithms for dealing with class imbalance problem.
arXiv Detail & Related papers (2020-12-30T17:17:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.