A Customer Level Fraudulent Activity Detection Benchmark for Enhancing Machine Learning Model Research and Evaluation
- URL: http://arxiv.org/abs/2404.14746v1
- Date: Tue, 23 Apr 2024 04:57:44 GMT
- Title: A Customer Level Fraudulent Activity Detection Benchmark for Enhancing Machine Learning Model Research and Evaluation
- Authors: Phoebe Jing, Yijing Gao, Xianlong Zeng,
- Abstract summary: This study introduces a benchmark that contains structured datasets specifically designed for customer-level fraud detection.
The benchmark not only adheres to strict privacy guidelines to ensure user confidentiality but also provides a rich source of information by encapsulating customer-centric features.
- Score: 0.4681661603096334
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the field of fraud detection, the availability of comprehensive and privacy-compliant datasets is crucial for advancing machine learning research and developing effective anti-fraud systems. Traditional datasets often focus on transaction-level information, which, while useful, overlooks the broader context of customer behavior patterns that are essential for detecting sophisticated fraud schemes. The scarcity of such data, primarily due to privacy concerns, significantly hampers the development and testing of predictive models that can operate effectively at the customer level. Addressing this gap, our study introduces a benchmark that contains structured datasets specifically designed for customer-level fraud detection. The benchmark not only adheres to strict privacy guidelines to ensure user confidentiality but also provides a rich source of information by encapsulating customer-centric features. We have developed the benchmark that allows for the comprehensive evaluation of various machine learning models, facilitating a deeper understanding of their strengths and weaknesses in predicting fraudulent activities. Through this work, we seek to bridge the existing gap in data availability, offering researchers and practitioners a valuable resource that empowers the development of next-generation fraud detection techniques.
Related papers
- Collaborative Knowledge Infusion for Low-resource Stance Detection [83.88515573352795]
Target-related knowledge is often needed to assist stance detection models.
We propose a collaborative knowledge infusion approach for low-resource stance detection tasks.
arXiv Detail & Related papers (2024-03-28T08:32:14Z) - On the Potential of Network-Based Features for Fraud Detection [3.0846824529023382]
This article explores using the personalised PageRank (PPR) algorithm to capture the social dynamics of fraud.
The primary objective is to compare the performance of traditional features with the addition of PPR in fraud detection models.
Results indicate that integrating PPR enhances the model's predictive power, surpassing the baseline model.
arXiv Detail & Related papers (2024-02-14T13:20:09Z) - Privacy-Preserving Financial Anomaly Detection via Federated Learning & Multi-Party Computation [17.314619091307343]
We describe a privacy-preserving framework that allows financial institutions to jointly train highly accurate anomaly detection models.
We show that our solution enables the network to train a highly accurate anomaly detection model while preserving privacy of customer data.
arXiv Detail & Related papers (2023-10-06T19:16:41Z) - Re-thinking Data Availablity Attacks Against Deep Neural Networks [53.64624167867274]
In this paper, we re-examine the concept of unlearnable examples and discern that the existing robust error-minimizing noise presents an inaccurate optimization objective.
We introduce a novel optimization paradigm that yields improved protection results with reduced computational time requirements.
arXiv Detail & Related papers (2023-05-18T04:03:51Z) - Auditing and Generating Synthetic Data with Controllable Trust Trade-offs [54.262044436203965]
We introduce a holistic auditing framework that comprehensively evaluates synthetic datasets and AI models.
It focuses on preventing bias and discrimination, ensures fidelity to the source data, assesses utility, robustness, and privacy preservation.
We demonstrate the framework's effectiveness by auditing various generative models across diverse use cases.
arXiv Detail & Related papers (2023-04-21T09:03:18Z) - Data AUDIT: Identifying Attribute Utility- and Detectability-Induced
Bias in Task Models [8.420252576694583]
We present a first technique for the rigorous, quantitative screening of medical image datasets.
Our method decomposes the risks associated with dataset attributes in terms of their detectability and utility.
Using our method, we show our screening method reliably identifies nearly imperceptible bias-inducing artifacts.
arXiv Detail & Related papers (2023-04-06T16:50:15Z) - Understanding Information Disclosure from Secure Computation Output: A Study of Average Salary Computation [58.74407460023331]
Quantifying information disclosure about private inputs from observing a function outcome is the subject of this work.
Motivated by the City of Boston gender pay gap studies, in this work we focus on the computation of the average of salaries.
arXiv Detail & Related papers (2022-09-21T15:59:48Z) - Towards a Data Privacy-Predictive Performance Trade-off [2.580765958706854]
We evaluate the existence of a trade-off between data privacy and predictive performance in classification tasks.
Unlike previous literature, we confirm that the higher the level of privacy, the higher the impact on predictive performance.
arXiv Detail & Related papers (2022-01-13T21:48:51Z) - Leaking Sensitive Financial Accounting Data in Plain Sight using Deep
Autoencoder Neural Networks [1.9659095632676094]
We introduce a real-world threat model' designed to leak sensitive accounting data.
We show that a deep steganographic process, constituted by three neural networks, can be trained to hide such data in unobtrusive day-to-day' images.
arXiv Detail & Related papers (2020-12-13T17:29:53Z) - PCAL: A Privacy-preserving Intelligent Credit Risk Modeling Framework
Based on Adversarial Learning [111.19576084222345]
This paper proposes a framework of Privacy-preserving Credit risk modeling based on Adversarial Learning (PCAL)
PCAL aims to mask the private information inside the original dataset, while maintaining the important utility information for the target prediction task performance.
Results indicate that PCAL can learn an effective, privacy-free representation from user data, providing a solid foundation towards privacy-preserving machine learning for credit risk analysis.
arXiv Detail & Related papers (2020-10-06T07:04:59Z) - Accurate and Robust Feature Importance Estimation under Distribution
Shifts [49.58991359544005]
PRoFILE is a novel feature importance estimation method.
We show significant improvements over state-of-the-art approaches, both in terms of fidelity and robustness.
arXiv Detail & Related papers (2020-09-30T05:29:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.