Fraud Dataset Benchmark and Applications
- URL: http://arxiv.org/abs/2208.14417v3
- Date: Fri, 22 Sep 2023 14:50:22 GMT
- Title: Fraud Dataset Benchmark and Applications
- Authors: Prince Grover, Julia Xu, Justin Tittelfitz, Anqi Cheng, Zheng Li,
Jakub Zablocki, Jianbo Liu, Hao Zhou
- Abstract summary: Fraud dataset Benchmark (FDB) is a compilation of publicly available datasets catered to fraud detection.
FDB comprises variety of fraud related tasks, ranging from identifying fraudulent card-not-present transactions, detecting bot attacks, classifying malicious URLs, estimating risk of loan default to content moderation.
Python based library for FDB provides a consistent API for data loading with standardized training and testing splits.
- Score: 25.184342958800293
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Standardized datasets and benchmarks have spurred innovations in computer
vision, natural language processing, multi-modal and tabular settings. We note
that, as compared to other well researched fields, fraud detection has unique
challenges: high-class imbalance, diverse feature types, frequently changing
fraud patterns, and adversarial nature of the problem. Due to these, the
modeling approaches evaluated on datasets from other research fields may not
work well for the fraud detection. In this paper, we introduce Fraud Dataset
Benchmark (FDB), a compilation of publicly available datasets catered to fraud
detection FDB comprises variety of fraud related tasks, ranging from
identifying fraudulent card-not-present transactions, detecting bot attacks,
classifying malicious URLs, estimating risk of loan default to content
moderation. The Python based library for FDB provides a consistent API for data
loading with standardized training and testing splits. We demonstrate several
applications of FDB that are of broad interest for fraud detection, including
feature engineering, comparison of supervised learning algorithms, label noise
removal, class-imbalance treatment and semi-supervised learning. We hope that
FDB provides a common playground for researchers and practitioners in the fraud
detection domain to develop robust and customized machine learning techniques
targeting various fraud use cases.
Related papers
- Thinking Racial Bias in Fair Forgery Detection: Models, Datasets and Evaluations [63.52709761339949]
We first contribute a dedicated dataset called the Fair Forgery Detection (FairFD) dataset, where we prove the racial bias of public state-of-the-art (SOTA) methods.
We design novel metrics including Approach Averaged Metric and Utility Regularized Metric, which can avoid deceptive results.
We also present an effective and robust post-processing technique, Bias Pruning with Fair Activations (BPFA), which improves fairness without requiring retraining or weight updates.
arXiv Detail & Related papers (2024-07-19T14:53:18Z) - Advancing Anomaly Detection: Non-Semantic Financial Data Encoding with LLMs [49.57641083688934]
We introduce a novel approach to anomaly detection in financial data using Large Language Models (LLMs) embeddings.
Our experiments demonstrate that LLMs contribute valuable information to anomaly detection as our models outperform the baselines.
arXiv Detail & Related papers (2024-06-05T20:19:09Z) - A Comprehensive Library for Benchmarking Multi-class Visual Anomaly Detection [52.228708947607636]
This paper introduces a comprehensive visual anomaly detection benchmark, ADer, which is a modular framework for new methods.
The benchmark includes multiple datasets from industrial and medical domains, implementing fifteen state-of-the-art methods and nine comprehensive metrics.
We objectively reveal the strengths and weaknesses of different methods and provide insights into the challenges and future directions of multi-class visual anomaly detection.
arXiv Detail & Related papers (2024-06-05T13:40:07Z) - Transaction Fraud Detection via an Adaptive Graph Neural Network [64.9428588496749]
We propose an Adaptive Sampling and Aggregation-based Graph Neural Network (ASA-GNN) that learns discriminative representations to improve the performance of transaction fraud detection.
A neighbor sampling strategy is performed to filter noisy nodes and supplement information for fraudulent nodes.
Experiments on three real financial datasets demonstrate that the proposed method ASA-GNN outperforms state-of-the-art ones.
arXiv Detail & Related papers (2023-07-11T07:48:39Z) - Credit Card Fraud Detection Using Enhanced Random Forest Classifier for
Imbalanced Data [0.8223798883838329]
This paper implements the random forest (RF) algorithm to solve the issue in the hand.
A dataset of credit card transactions was used in this study.
arXiv Detail & Related papers (2023-03-11T22:59:37Z) - Weakly Supervised Anomaly Detection: A Survey [75.26180038443462]
Anomaly detection (AD) is a crucial task in machine learning with various applications.
We present the first comprehensive survey of weakly supervised anomaly detection (WSAD) methods.
For each setting, we provide formal definitions, key algorithms, and potential future directions.
arXiv Detail & Related papers (2023-02-09T10:27:21Z) - Empirical study of Machine Learning Classifier Evaluation Metrics
behavior in Massively Imbalanced and Noisy data [0.0]
We develop a theoretical foundation to model human annotation errors and extreme imbalance typical in real world fraud detection data sets.
We demonstrate that a combined F1 score and g-mean, in that specific order, is the best evaluation metric for typical imbalanced fraud detection model classification.
arXiv Detail & Related papers (2022-08-25T07:30:31Z) - Credit card fraud detection - Classifier selection strategy [0.0]
Using a sample of annotated transactions, a machine learning classification algorithm learns to detect frauds.
fraud data sets are diverse and exhibit inconsistent characteristics.
We propose a data-driven classifier selection strategy for characteristic highly imbalanced fraud detection data sets.
arXiv Detail & Related papers (2022-08-25T07:13:42Z) - Challenges and Complexities in Machine Learning based Credit Card Fraud
Detection [0.0]
Volume of transactions, uniqueness of frauds and ingenuity of the fraudster are main challenges in detecting frauds.
The advent of machine learning, artificial intelligence and big data has opened up new tools in the fight against frauds.
However, the developments in fraud detection algorithms has been challenging and slow due to the massively unbalanced nature of fraud data.
arXiv Detail & Related papers (2022-08-20T07:53:51Z) - A Continual Deepfake Detection Benchmark: Dataset, Methods, and
Essentials [97.69553832500547]
This paper suggests a continual deepfake detection benchmark (CDDB) over a new collection of deepfakes from both known and unknown generative models.
We exploit multiple approaches to adapt multiclass incremental learning methods, commonly used in the continual visual recognition, to the continual deepfake detection problem.
arXiv Detail & Related papers (2022-05-11T13:07:19Z) - Applying support vector data description for fraud detection [0.0]
One of the main challenges in fraud detection is acquiring fraud samples which is a complex and challenging task.
In order to deal with this challenge, we apply one-class classification methods such as SVDD which does not need the fraud samples for training.
Also, we present our algorithm REDBSCAN which is an extension of DBSCAN to reduce the number of samples and select those that keep the shape of data.
arXiv Detail & Related papers (2020-05-31T21:31:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.