Fraud Detection Using Optimized Machine Learning Tools Under Imbalance
Classes
- URL: http://arxiv.org/abs/2209.01642v1
- Date: Sun, 4 Sep 2022 15:30:23 GMT
- Title: Fraud Detection Using Optimized Machine Learning Tools Under Imbalance
Classes
- Authors: Mary Isangediok, Kelum Gajamannage
- Abstract summary: Fraud detection with smart versions of machine learning (ML) tools is essential to assure safety.
We investigate four state-of-the-art ML techniques, namely, logistic regression, decision trees, random forest, and extreme gradient boost.
For phishing website URLs and credit card fraud transaction datasets, the results indicate that extreme gradient boost trained on the original data shows trustworthy performance.
- Score: 0.304585143845864
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Fraud detection is a challenging task due to the changing nature of fraud
patterns over time and the limited availability of fraud examples to learn such
sophisticated patterns. Thus, fraud detection with the aid of smart versions of
machine learning (ML) tools is essential to assure safety. Fraud detection is a
primary ML classification task; however, the optimum performance of the
corresponding ML tool relies on the usage of the best hyperparameter values.
Moreover, classification under imbalanced classes is quite challenging as it
causes poor performance in minority classes, which most ML classification
techniques ignore. Thus, we investigate four state-of-the-art ML techniques,
namely, logistic regression, decision trees, random forest, and extreme
gradient boost, that are suitable for handling imbalance classes to maximize
precision and simultaneously reduce false positives. First, these classifiers
are trained on two original benchmark unbalanced fraud detection datasets,
namely, phishing website URLs and fraudulent credit card transactions. Then,
three synthetically balanced datasets are produced for each original data set
by implementing the sampling frameworks, namely, RandomUnderSampler, SMOTE, and
SMOTEENN. The optimum hyperparameters for all the 16 experiments are revealed
using the method RandomzedSearchCV. The validity of the 16 approaches in the
context of fraud detection is compared using two benchmark performance metrics,
namely, area under the curve of receiver operating characteristics (AUC ROC)
and area under the curve of precision and recall (AUC PR). For both phishing
website URLs and credit card fraud transaction datasets, the results indicate
that extreme gradient boost trained on the original data shows trustworthy
performance in the imbalanced dataset and manages to outperform the other three
methods in terms of both AUC ROC and AUC PR.
Related papers
- Malicious URL Detection using optimized Hist Gradient Boosting Classifier based on grid search method [0.0]
Trusting the accuracy of data inputted on online platforms can be difficult due to the possibility of malicious websites gathering information for unlawful reasons.
To detect the risk posed by malicious websites, it is proposed to utilize Machine Learning (ML)-based techniques.
The dataset used contains 1781 records of malicious benign website data with 13 features.
arXiv Detail & Related papers (2024-06-12T11:16:30Z) - Securing Transactions: A Hybrid Dependable Ensemble Machine Learning
Model using IHT-LR and Grid Search [2.4374097382908477]
We introduce a state-of-the-art hybrid ensemble (ENS) Machine learning (ML) model that intelligently combines multiple algorithms to enhance fraud identification.
Our experiments are conducted on a publicly available credit card dataset comprising 284,807 transactions.
The proposed model achieves impressive accuracy rates of 99.66%, 99.73%, 98.56%, and 99.79%, and a perfect 100% for the DT, RF, KNN, and ENS models, respectively.
arXiv Detail & Related papers (2024-02-22T09:01:42Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - Performance evaluation of Machine learning algorithms for Intrusion Detection System [0.40964539027092917]
This paper focuses on intrusion detection systems (IDSs) analysis using Machine Learning (ML) techniques.
We analyze the KDD CUP-'99' intrusion detection dataset used for training and validating ML models.
arXiv Detail & Related papers (2023-10-01T06:35:37Z) - Credit Card Fraud Detection Using Enhanced Random Forest Classifier for
Imbalanced Data [0.8223798883838329]
This paper implements the random forest (RF) algorithm to solve the issue in the hand.
A dataset of credit card transactions was used in this study.
arXiv Detail & Related papers (2023-03-11T22:59:37Z) - MAPS: A Noise-Robust Progressive Learning Approach for Source-Free
Domain Adaptive Keypoint Detection [76.97324120775475]
Cross-domain keypoint detection methods always require accessing the source data during adaptation.
This paper considers source-free domain adaptive keypoint detection, where only the well-trained source model is provided to the target domain.
arXiv Detail & Related papers (2023-02-09T12:06:08Z) - Credit card fraud detection - Classifier selection strategy [0.0]
Using a sample of annotated transactions, a machine learning classification algorithm learns to detect frauds.
fraud data sets are diverse and exhibit inconsistent characteristics.
We propose a data-driven classifier selection strategy for characteristic highly imbalanced fraud detection data sets.
arXiv Detail & Related papers (2022-08-25T07:13:42Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - A Symmetric Loss Perspective of Reliable Machine Learning [87.68601212686086]
We review how a symmetric loss can yield robust classification from corrupted labels in balanced error rate (BER) minimization.
We demonstrate how the robust AUC method can benefit natural language processing in the problem where we want to learn only from relevant keywords.
arXiv Detail & Related papers (2021-01-05T06:25:47Z) - Towards Accurate Knowledge Transfer via Target-awareness Representation
Disentanglement [56.40587594647692]
We propose a novel transfer learning algorithm, introducing the idea of Target-awareness REpresentation Disentanglement (TRED)
TRED disentangles the relevant knowledge with respect to the target task from the original source model and used as a regularizer during fine-tuning the target model.
Experiments on various real world datasets show that our method stably improves the standard fine-tuning by more than 2% in average.
arXiv Detail & Related papers (2020-10-16T17:45:08Z) - Provable tradeoffs in adversarially robust classification [96.48180210364893]
We develop and leverage new tools, including recent breakthroughs from probability theory on robust isoperimetry.
Our results reveal fundamental tradeoffs between standard and robust accuracy that grow when data is imbalanced.
arXiv Detail & Related papers (2020-06-09T09:58:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.