Related papers: Empirical study of Machine Learning Classifier Evaluation Metrics behavior in Massively Imbalanced and Noisy data

Empirical study of Machine Learning Classifier Evaluation Metrics behavior in Massively Imbalanced and Noisy data

URL: http://arxiv.org/abs/2208.11904v1
Date: Thu, 25 Aug 2022 07:30:31 GMT
Title: Empirical study of Machine Learning Classifier Evaluation Metrics behavior in Massively Imbalanced and Noisy data
Authors: Gayan K. Kulatilleke, Sugandika Samarakoon
Abstract summary: We develop a theoretical foundation to model human annotation errors and extreme imbalance typical in real world fraud detection data sets. We demonstrate that a combined F1 score and g-mean, in that specific order, is the best evaluation metric for typical imbalanced fraud detection model classification.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With growing credit card transaction volumes, the fraud percentages are also rising, including overhead costs for institutions to combat and compensate victims. The use of machine learning into the financial sector permits more effective protection against fraud and other economic crime. Suitably trained machine learning classifiers help proactive fraud detection, improving stakeholder trust and robustness against illicit transactions. However, the design of machine learning based fraud detection algorithms has been challenging and slow due the massively unbalanced nature of fraud data and the challenges of identifying the frauds accurately and completely to create a gold standard ground truth. Furthermore, there are no benchmarks or standard classifier evaluation metrics to measure and identify better performing classifiers, thus keeping researchers in the dark. In this work, we develop a theoretical foundation to model human annotation errors and extreme imbalance typical in real world fraud detection data sets. By conducting empirical experiments on a hypothetical classifier, with a synthetic data distribution approximated to a popular real world credit card fraud data set, we simulate human annotation errors and extreme imbalance to observe the behavior of popular machine learning classifier evaluation matrices. We demonstrate that a combined F1 score and g-mean, in that specific order, is the best evaluation metric for typical imbalanced fraud detection model classification.

Related papers

Adversarial Bias: Data Poisoning Attacks on Fairness [48.17618627431355]
There is relatively little research on how an AI system's fairness can be intentionally compromised.<n>In this work, we provide a theoretical analysis demonstrating that a simple adversarial poisoning strategy is sufficient to induce maximally unfair behavior.<n>Our attack significantly outperforms existing methods in degrading fairness metrics across multiple models and datasets.
arXiv Detail & Related papers (2025-11-11T15:09:53Z)
A Comprehensive Performance Comparison of Traditional and Ensemble Machine Learning Models for Online Fraud Detection [0.0]
Real-time fraud detection is essential for financial security but remains challenging due to high transaction volumes and the complexity of modern fraud patterns.<n>This study presents a comprehensive comparison between traditional machine learning models like Random Forest, SVM, Logistic Regression, and ensemble methods like Stacking and Voting.<n>The ensemble methods achieved an almost perfect precision of around 0.99, but traditional methods demonstrated superior performance in terms of recall.
arXiv Detail & Related papers (2025-09-21T17:53:24Z)
Evaluating Supervised Learning Models for Fraud Detection: A Comparative Study of Classical and Deep Architectures on Imbalanced Transaction Data [2.5670390559986442]
Fraud detection remains a critical task in high-stakes domains such as finance and e-commerce.<n>We systematically compare the performance of four supervised learning models on a large-scale, highly imbalanced online transaction dataset.
arXiv Detail & Related papers (2025-05-28T16:08:04Z)
Advancing Anomaly Detection: Non-Semantic Financial Data Encoding with LLMs [49.57641083688934]
We introduce a novel approach to anomaly detection in financial data using Large Language Models (LLMs) embeddings. Our experiments demonstrate that LLMs contribute valuable information to anomaly detection as our models outperform the baselines.
arXiv Detail & Related papers (2024-06-05T20:19:09Z)
Explainable Fraud Detection with Deep Symbolic Classification [4.1205832766381985]
We present Deep Classification, an extension of the Deep Symbolic Regression framework to classification problems. Because the functions are mathematical expressions that are in closed-form and concise, the model is inherently explainable both at the level of a single classification decision and the model's decision process. An evaluation on the PaySim data set demonstrates competitive predictive performance with state-of-the-art models, while surpassing them in terms of explainability.
arXiv Detail & Related papers (2023-12-01T13:50:55Z)
Credit Card Fraud Detection with Subspace Learning-based One-Class Classification [18.094622095967328]
One-Class Classification (OCC) algorithms excel in handling imbalanced data distributions. These algorithms integrate subspace learning into the data description. These algorithms transform the data into a lower-dimensional subspace optimized for OCC.
arXiv Detail & Related papers (2023-09-26T12:26:28Z)
Learning for Counterfactual Fairness from Observational Data [62.43249746968616]
Fairness-aware machine learning aims to eliminate biases of learning models against certain subgroups described by certain protected (sensitive) attributes such as race, gender, and age. A prerequisite for existing methods to achieve counterfactual fairness is the prior human knowledge of the causal model for the data. In this work, we address the problem of counterfactually fair prediction from observational data without given causal models by proposing a novel framework CLAIRE.
arXiv Detail & Related papers (2023-07-17T04:08:29Z)
Transaction Fraud Detection via an Adaptive Graph Neural Network [64.9428588496749]
We propose an Adaptive Sampling and Aggregation-based Graph Neural Network (ASA-GNN) that learns discriminative representations to improve the performance of transaction fraud detection. A neighbor sampling strategy is performed to filter noisy nodes and supplement information for fraudulent nodes. Experiments on three real financial datasets demonstrate that the proposed method ASA-GNN outperforms state-of-the-art ones.
arXiv Detail & Related papers (2023-07-11T07:48:39Z)
Credit card fraud detection - Classifier selection strategy [0.0]
Using a sample of annotated transactions, a machine learning classification algorithm learns to detect frauds. fraud data sets are diverse and exhibit inconsistent characteristics. We propose a data-driven classifier selection strategy for characteristic highly imbalanced fraud detection data sets.
arXiv Detail & Related papers (2022-08-25T07:13:42Z)
Challenges and Complexities in Machine Learning based Credit Card Fraud Detection [0.0]
Volume of transactions, uniqueness of frauds and ingenuity of the fraudster are main challenges in detecting frauds. The advent of machine learning, artificial intelligence and big data has opened up new tools in the fight against frauds. However, the developments in fraud detection algorithms has been challenging and slow due to the massively unbalanced nature of fraud data.
arXiv Detail & Related papers (2022-08-20T07:53:51Z)
Prototype-Anchored Learning for Learning with Imperfect Annotations [83.7763875464011]
It is challenging to learn unbiased classification models from imperfectly annotated datasets. We propose a prototype-anchored learning (PAL) method, which can be easily incorporated into various learning-based classification schemes. We verify the effectiveness of PAL on class-imbalanced learning and noise-tolerant learning by extensive experiments on synthetic and real-world datasets.
arXiv Detail & Related papers (2022-06-23T10:25:37Z)
GAN based Data Augmentation to Resolve Class Imbalance [0.0]
In many related tasks, the datasets have a very small number of observed fraud cases. This imbalance presence may impact any learning model's behavior by predicting all labels as the majority class. We trained Generative Adversarial Network(GAN) to generate a large number of convincing (and reliable) synthetic examples of the minority class.
arXiv Detail & Related papers (2022-06-12T21:21:55Z)
Fairness in Semi-supervised Learning: Unlabeled Data Help to Reduce Discrimination [53.3082498402884]
A growing specter in the rise of machine learning is whether the decisions made by machine learning models are fair. We present a framework of fair semi-supervised learning in the pre-processing phase, including pseudo labeling to predict labels for unlabeled data. A theoretical decomposition analysis of bias, variance and noise highlights the different sources of discrimination and the impact they have on fairness in semi-supervised learning.
arXiv Detail & Related papers (2020-09-25T05:48:56Z)
Adversarial Self-Supervised Contrastive Learning [62.17538130778111]
Existing adversarial learning approaches mostly use class labels to generate adversarial samples that lead to incorrect predictions. We propose a novel adversarial attack for unlabeled data, which makes the model confuse the instance-level identities of the perturbed data samples. We present a self-supervised contrastive learning framework to adversarially train a robust neural network without labeled data.
arXiv Detail & Related papers (2020-06-13T08:24:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.