A Data Balancing and Ensemble Learning Approach for Credit Card Fraud Detection
- URL: http://arxiv.org/abs/2503.21160v1
- Date: Thu, 27 Mar 2025 04:59:45 GMT
- Title: A Data Balancing and Ensemble Learning Approach for Credit Card Fraud Detection
- Authors: Yuhan Wang,
- Abstract summary: This research introduces an innovative method for identifying credit card fraud by combining the SMOTE-KMEANS technique with an ensemble machine learning model.<n>The proposed model was benchmarked against traditional models such as logistic regression, decision trees, random forests, and support vector machines.<n>Results demonstrated that the proposed model achieved superior performance, with an AUC of 0.96 when combined with the SMOTE-KMEANS algorithm.
- Score: 1.8921747725821432
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This research introduces an innovative method for identifying credit card fraud by combining the SMOTE-KMEANS technique with an ensemble machine learning model. The proposed model was benchmarked against traditional models such as logistic regression, decision trees, random forests, and support vector machines. Performance was evaluated using metrics, including accuracy, recall, and area under the curve (AUC). The results demonstrated that the proposed model achieved superior performance, with an AUC of 0.96 when combined with the SMOTE-KMEANS algorithm. This indicates a significant improvement in detecting fraudulent transactions while maintaining high precision and recall. The study also explores the application of different oversampling techniques to enhance the performance of various classifiers. The findings suggest that the proposed method is robust and effective for classification tasks on balanced datasets. Future research directions include further optimization of the SMOTE-KMEANS approach and its integration into existing fraud detection systems to enhance financial security and consumer protection.
Related papers
- Exploring Training and Inference Scaling Laws in Generative Retrieval [50.82554729023865]
We investigate how model size, training data scale, and inference-time compute jointly influence generative retrieval performance.<n>Our experiments show that n-gram-based methods demonstrate strong alignment with both training and inference scaling laws.<n>We find that LLaMA models consistently outperform T5 models, suggesting a particular advantage for larger decoder-only models in generative retrieval.
arXiv Detail & Related papers (2025-03-24T17:59:03Z) - A Survey on Inference Optimization Techniques for Mixture of Experts Models [50.40325411764262]
Large-scale Mixture of Experts (MoE) models offer enhanced model capacity and computational efficiency through conditional computation.
deploying and running inference on these models presents significant challenges in computational resources, latency, and energy efficiency.
This survey analyzes optimization techniques for MoE models across the entire system stack.
arXiv Detail & Related papers (2024-12-18T14:11:15Z) - A Hybrid Framework for Statistical Feature Selection and Image-Based Noise-Defect Detection [55.2480439325792]
This paper presents a hybrid framework that integrates both statistical feature selection and classification techniques to improve defect detection accuracy.
We present around 55 distinguished features that are extracted from industrial images, which are then analyzed using statistical methods.
By integrating these methods with flexible machine learning applications, the proposed framework improves detection accuracy and reduces false positives and misclassifications.
arXiv Detail & Related papers (2024-12-11T22:12:21Z) - Enhancing Few-Shot Learning with Integrated Data and GAN Model Approaches [35.431340001608476]
This paper presents an innovative approach to enhancing few-shot learning by integrating data augmentation with model fine-tuning.
It aims to tackle the challenges posed by small-sample data in fields such as drug discovery, target recognition, and malicious traffic detection.
Results confirm that the MhERGAN algorithm developed in this research is highly effective for few-shot learning.
arXiv Detail & Related papers (2024-11-25T16:51:11Z) - Integration of Active Learning and MCMC Sampling for Efficient Bayesian Calibration of Mechanical Properties [0.5242869847419834]
We show that a priori training of the surrogate model introduces large errors in the posterior estimation even in low to moderate dimensions.
We introduce a simple active learning strategy based on the path of the MCMC algorithm that is superior to all a priori trained models.
We identify the forward model as the bottleneck in the inference process, not the MCMC algorithm.
arXiv Detail & Related papers (2024-11-20T14:35:16Z) - Explainable AI for Fraud Detection: An Attention-Based Ensemble of CNNs, GNNs, and A Confidence-Driven Gating Mechanism [5.486205584465161]
This study presents a new stacking-based approach for CCF detection by adding two extra layers to the usual classification process.<n>In the attention layer, we combine soft outputs from a convolutional neural network (CNN) and a recurrent neural network (RNN) using the dependent ordered weighted averaging (DOWA) operator.<n>In the confidence-based layer, we select whichever aggregate (DOWA or IOWA) shows lower uncertainty to feed into a meta-learner.<n>Experiments on three datasets show that our method achieves high accuracy and robust generalization, making it effective for CCF detection.
arXiv Detail & Related papers (2024-10-01T09:56:23Z) - Advanced Payment Security System:XGBoost, LightGBM and SMOTE Integrated [16.906931748453342]
This study explores the application of advanced machine learning models, specifically based on XGBoost and LightGBM.
By selecting highly correlated features, we aimed to strengthen the training process and boost model performance.
Our detailed analyses and comparisons reveal that the combination of SMOTE with XGBoost and LightGBM offers a highly efficient and powerful mechanism for payment security protection.
arXiv Detail & Related papers (2024-06-07T05:56:43Z) - Enhancing Credit Card Fraud Detection A Neural Network and SMOTE Integrated Approach [4.341096233663623]
This research proposes an innovative methodology combining Neural Networks (NN) and Synthet ic Minority Over-sampling Technique (SMOTE) to enhance the detection performance.
The study addresses the inherent imbalance in credit card transaction data, focusing on technical advancements for robust and precise fraud detection.
arXiv Detail & Related papers (2024-02-27T02:26:04Z) - Securing Transactions: A Hybrid Dependable Ensemble Machine Learning
Model using IHT-LR and Grid Search [2.4374097382908477]
We introduce a state-of-the-art hybrid ensemble (ENS) Machine learning (ML) model that intelligently combines multiple algorithms to enhance fraud identification.
Our experiments are conducted on a publicly available credit card dataset comprising 284,807 transactions.
The proposed model achieves impressive accuracy rates of 99.66%, 99.73%, 98.56%, and 99.79%, and a perfect 100% for the DT, RF, KNN, and ENS models, respectively.
arXiv Detail & Related papers (2024-02-22T09:01:42Z) - InfoRM: Mitigating Reward Hacking in RLHF via Information-Theoretic Reward Modeling [66.3072381478251]
Reward hacking, also termed reward overoptimization, remains a critical challenge.
We propose a framework for reward modeling, namely InfoRM, by introducing a variational information bottleneck objective.
We show that InfoRM's overoptimization detection mechanism is not only effective but also robust across a broad range of datasets.
arXiv Detail & Related papers (2024-02-14T17:49:07Z) - MACE: An Efficient Model-Agnostic Framework for Counterfactual
Explanation [132.77005365032468]
We propose a novel framework of Model-Agnostic Counterfactual Explanation (MACE)
In our MACE approach, we propose a novel RL-based method for finding good counterfactual examples and a gradient-less descent method for improving proximity.
Experiments on public datasets validate the effectiveness with better validity, sparsity and proximity.
arXiv Detail & Related papers (2022-05-31T04:57:06Z) - Ensemble and Mixed Learning Techniques for Credit Card Fraud Detection [0.0]
We use a mixed learning technique that uses K-means preprocessing before trained classification to the problem at hand.
We introduce an adapted detector ensemble technique that uses OR-logic algorithm aggregation to enhance the detection rate.
We observed from simulation results that the proposed methods diminished computational cost and enhanced performance.
arXiv Detail & Related papers (2021-12-05T17:17:04Z) - Bayesian Optimization with Machine Learning Algorithms Towards Anomaly
Detection [66.05992706105224]
In this paper, an effective anomaly detection framework is proposed utilizing Bayesian Optimization technique.
The performance of the considered algorithms is evaluated using the ISCX 2012 dataset.
Experimental results show the effectiveness of the proposed framework in term of accuracy rate, precision, low-false alarm rate, and recall.
arXiv Detail & Related papers (2020-08-05T19:29:35Z) - SAMBA: Safe Model-Based & Active Reinforcement Learning [59.01424351231993]
SAMBA is a framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and statistics.
We evaluate our algorithm on a variety of safe dynamical system benchmarks involving both low and high-dimensional state representations.
We provide intuition as to the effectiveness of the framework by a detailed analysis of our active metrics and safety constraints.
arXiv Detail & Related papers (2020-06-12T10:40:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.