Related papers: Fraud Detection Using Optimized Machine Learning Tools Under Imbalance Classes

Fraud Detection Using Optimized Machine Learning Tools Under Imbalance Classes

URL: http://arxiv.org/abs/2209.01642v1
Date: Sun, 4 Sep 2022 15:30:23 GMT
Title: Fraud Detection Using Optimized Machine Learning Tools Under Imbalance Classes
Authors: Mary Isangediok, Kelum Gajamannage
Abstract summary: Fraud detection with smart versions of machine learning (ML) tools is essential to assure safety. We investigate four state-of-the-art ML techniques, namely, logistic regression, decision trees, random forest, and extreme gradient boost. For phishing website URLs and credit card fraud transaction datasets, the results indicate that extreme gradient boost trained on the original data shows trustworthy performance.
Score: 0.304585143845864
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Fraud detection is a challenging task due to the changing nature of fraud patterns over time and the limited availability of fraud examples to learn such sophisticated patterns. Thus, fraud detection with the aid of smart versions of machine learning (ML) tools is essential to assure safety. Fraud detection is a primary ML classification task; however, the optimum performance of the corresponding ML tool relies on the usage of the best hyperparameter values. Moreover, classification under imbalanced classes is quite challenging as it causes poor performance in minority classes, which most ML classification techniques ignore. Thus, we investigate four state-of-the-art ML techniques, namely, logistic regression, decision trees, random forest, and extreme gradient boost, that are suitable for handling imbalance classes to maximize precision and simultaneously reduce false positives. First, these classifiers are trained on two original benchmark unbalanced fraud detection datasets, namely, phishing website URLs and fraudulent credit card transactions. Then, three synthetically balanced datasets are produced for each original data set by implementing the sampling frameworks, namely, RandomUnderSampler, SMOTE, and SMOTEENN. The optimum hyperparameters for all the 16 experiments are revealed using the method RandomzedSearchCV. The validity of the 16 approaches in the context of fraud detection is compared using two benchmark performance metrics, namely, area under the curve of receiver operating characteristics (AUC ROC) and area under the curve of precision and recall (AUC PR). For both phishing website URLs and credit card fraud transaction datasets, the results indicate that extreme gradient boost trained on the original data shows trustworthy performance in the imbalanced dataset and manages to outperform the other three methods in terms of both AUC ROC and AUC PR.

Related papers

Evaluating Supervised Learning Models for Fraud Detection: A Comparative Study of Classical and Deep Architectures on Imbalanced Transaction Data [2.5670390559986442]
Fraud detection remains a critical task in high-stakes domains such as finance and e-commerce.<n>We systematically compare the performance of four supervised learning models on a large-scale, highly imbalanced online transaction dataset.
arXiv Detail & Related papers (2025-05-28T16:08:04Z)
A Hybrid Framework for Statistical Feature Selection and Image-Based Noise-Defect Detection [55.2480439325792]
This paper presents a hybrid framework that integrates both statistical feature selection and classification techniques to improve defect detection accuracy. We present around 55 distinguished features that are extracted from industrial images, which are then analyzed using statistical methods. By integrating these methods with flexible machine learning applications, the proposed framework improves detection accuracy and reduces false positives and misclassifications.
arXiv Detail & Related papers (2024-12-11T22:12:21Z)
Malicious URL Detection using optimized Hist Gradient Boosting Classifier based on grid search method [0.0]
Trusting the accuracy of data inputted on online platforms can be difficult due to the possibility of malicious websites gathering information for unlawful reasons. To detect the risk posed by malicious websites, it is proposed to utilize Machine Learning (ML)-based techniques. The dataset used contains 1781 records of malicious benign website data with 13 features.
arXiv Detail & Related papers (2024-06-12T11:16:30Z)
Securing Transactions: A Hybrid Dependable Ensemble Machine Learning Model using IHT-LR and Grid Search [2.4374097382908477]
We introduce a state-of-the-art hybrid ensemble (ENS) Machine learning (ML) model that intelligently combines multiple algorithms to enhance fraud identification. Our experiments are conducted on a publicly available credit card dataset comprising 284,807 transactions. The proposed model achieves impressive accuracy rates of 99.66%, 99.73%, 98.56%, and 99.79%, and a perfect 100% for the DT, RF, KNN, and ENS models, respectively.
arXiv Detail & Related papers (2024-02-22T09:01:42Z)
Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice. We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z)
Performance evaluation of Machine learning algorithms for Intrusion Detection System [0.40964539027092917]
This paper focuses on intrusion detection systems (IDSs) analysis using Machine Learning (ML) techniques. We analyze the KDD CUP-'99' intrusion detection dataset used for training and validating ML models.
arXiv Detail & Related papers (2023-10-01T06:35:37Z)
Credit Card Fraud Detection Using Enhanced Random Forest Classifier for Imbalanced Data [0.8223798883838329]
This paper implements the random forest (RF) algorithm to solve the issue in the hand. A dataset of credit card transactions was used in this study.
arXiv Detail & Related papers (2023-03-11T22:59:37Z)
MAPS: A Noise-Robust Progressive Learning Approach for Source-Free Domain Adaptive Keypoint Detection [76.97324120775475]
Cross-domain keypoint detection methods always require accessing the source data during adaptation. This paper considers source-free domain adaptive keypoint detection, where only the well-trained source model is provided to the target domain.
arXiv Detail & Related papers (2023-02-09T12:06:08Z)
Credit card fraud detection - Classifier selection strategy [0.0]
Using a sample of annotated transactions, a machine learning classification algorithm learns to detect frauds. fraud data sets are diverse and exhibit inconsistent characteristics. We propose a data-driven classifier selection strategy for characteristic highly imbalanced fraud detection data sets.
arXiv Detail & Related papers (2022-08-25T07:13:42Z)
Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions. In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data. We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z)
A Symmetric Loss Perspective of Reliable Machine Learning [87.68601212686086]
We review how a symmetric loss can yield robust classification from corrupted labels in balanced error rate (BER) minimization. We demonstrate how the robust AUC method can benefit natural language processing in the problem where we want to learn only from relevant keywords.
arXiv Detail & Related papers (2021-01-05T06:25:47Z)
Towards Accurate Knowledge Transfer via Target-awareness Representation Disentanglement [56.40587594647692]
We propose a novel transfer learning algorithm, introducing the idea of Target-awareness REpresentation Disentanglement (TRED) TRED disentangles the relevant knowledge with respect to the target task from the original source model and used as a regularizer during fine-tuning the target model. Experiments on various real world datasets show that our method stably improves the standard fine-tuning by more than 2% in average.
arXiv Detail & Related papers (2020-10-16T17:45:08Z)
Provable tradeoffs in adversarially robust classification [96.48180210364893]
We develop and leverage new tools, including recent breakthroughs from probability theory on robust isoperimetry. Our results reveal fundamental tradeoffs between standard and robust accuracy that grow when data is imbalanced.
arXiv Detail & Related papers (2020-06-09T09:58:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.