Privacy-Preserving Graph-Based Machine Learning with Fully Homomorphic Encryption for Collaborative Anti-Money Laundering
- URL: http://arxiv.org/abs/2411.02926v1
- Date: Tue, 05 Nov 2024 09:13:53 GMT
- Title: Privacy-Preserving Graph-Based Machine Learning with Fully Homomorphic Encryption for Collaborative Anti-Money Laundering
- Authors: Fabrianne Effendi, Anupam Chattopadhyay,
- Abstract summary: This research presents a novel privacy-preserving approach for collaborative AML machine learning.
It facilitates secure data sharing across institutions and borders while preserving privacy and regulatory compliance.
The research contributes two key privacy-preserving pipelines.
- Score: 4.1964397179107085
- License:
- Abstract: Combating money laundering has become increasingly complex with the rise of cybercrime and digitalization of financial transactions. Graph-based machine learning techniques have emerged as promising tools for Anti-Money Laundering (AML) detection, capturing intricate relationships within money laundering networks. However, the effectiveness of AML solutions is hindered by data silos within financial institutions, limiting collaboration and overall efficacy. This research presents a novel privacy-preserving approach for collaborative AML machine learning, facilitating secure data sharing across institutions and borders while preserving privacy and regulatory compliance. Leveraging Fully Homomorphic Encryption (FHE), computations are directly performed on encrypted data, ensuring the confidentiality of financial data. Notably, FHE over the Torus (TFHE) was integrated with graph-based machine learning using Zama Concrete ML. The research contributes two key privacy-preserving pipelines. First, the development of a privacy-preserving Graph Neural Network (GNN) pipeline was explored. Optimization techniques like quantization and pruning were used to render the GNN FHE-compatible. Second, a privacy-preserving graph-based XGBoost pipeline leveraging Graph Feature Preprocessor (GFP) was successfully developed. Experiments demonstrated strong predictive performance, with the XGBoost model consistently achieving over 99% accuracy, F1-score, precision, and recall on the balanced AML dataset in both unencrypted and FHE-encrypted inference settings. On the imbalanced dataset, the incorporation of graph-based features improved the F1-score by 8%. The research highlights the need to balance the trade-off between privacy and computational efficiency.
Related papers
- Advancing Anomaly Detection: Non-Semantic Financial Data Encoding with LLMs [49.57641083688934]
We introduce a novel approach to anomaly detection in financial data using Large Language Models (LLMs) embeddings.
Our experiments demonstrate that LLMs contribute valuable information to anomaly detection as our models outperform the baselines.
arXiv Detail & Related papers (2024-06-05T20:19:09Z) - Privacy Amplification for the Gaussian Mechanism via Bounded Support [64.86780616066575]
Data-dependent privacy accounting frameworks such as per-instance differential privacy (pDP) and Fisher information loss (FIL) confer fine-grained privacy guarantees for individuals in a fixed training dataset.
We propose simple modifications of the Gaussian mechanism with bounded support, showing that they amplify privacy guarantees under data-dependent accounting.
arXiv Detail & Related papers (2024-03-07T21:22:07Z) - GuardML: Efficient Privacy-Preserving Machine Learning Services Through
Hybrid Homomorphic Encryption [2.611778281107039]
Privacy-Preserving Machine Learning (PPML) methods have been introduced to safeguard the privacy and security of Machine Learning models.
Modern cryptographic scheme, Hybrid Homomorphic Encryption (HHE) has recently emerged.
We develop and evaluate an HHE-based PPML application for classifying heart disease based on sensitive ECG data.
arXiv Detail & Related papers (2024-01-26T13:12:52Z) - Starlit: Privacy-Preserving Federated Learning to Enhance Financial
Fraud Detection [2.436659710491562]
Federated Learning (FL) is a data-minimization approach enabling collaborative model training across diverse clients with local data.
State-of-the-art FL solutions to identify fraudulent financial transactions exhibit a subset of the following limitations.
We introduce Starlit, a novel scalable privacy-preserving FL mechanism that overcomes these limitations.
arXiv Detail & Related papers (2024-01-19T15:37:11Z) - Privacy-Preserving Federated Learning over Vertically and Horizontally
Partitioned Data for Financial Anomaly Detection [11.167661320589488]
In real-world financial anomaly detection scenarios, the data is partitioned both vertically and horizontally.
Our solution combines fully homomorphic encryption (HE), secure multi-party computation (SMPC), differential privacy (DP)
Our solution won second prize in the first phase of the U.S. Privacy Enhancing Technologies (PETs) Prize Challenge.
arXiv Detail & Related papers (2023-10-30T06:51:33Z) - 2SFGL: A Simple And Robust Protocol For Graph-Based Fraud Detection [1.6427658855248812]
We propose a novel two-stage approach to federated graph learning (2SFGL)
2SFGL involves the virtual fusion of multiparty graphs, and the second involves model training and inference on the virtual graph.
We evaluate our framework on a conventional fraud detection task based on the FraudAmazonDataset and FraudYelpDataset.
arXiv Detail & Related papers (2023-10-12T13:48:26Z) - LaundroGraph: Self-Supervised Graph Representation Learning for
Anti-Money Laundering [5.478764356647437]
LaundroGraph is a novel self-supervised graph representation learning approach.
It provides insights to assist the anti-money laundering reviewing process.
To the best of our knowledge, this is the first fully self-supervised system within the context of AML detection.
arXiv Detail & Related papers (2022-10-25T21:58:02Z) - Distributed Machine Learning and the Semblance of Trust [66.1227776348216]
Federated Learning (FL) allows the data owner to maintain data governance and perform model training locally without having to share their data.
FL and related techniques are often described as privacy-preserving.
We explain why this term is not appropriate and outline the risks associated with over-reliance on protocols that were not designed with formal definitions of privacy in mind.
arXiv Detail & Related papers (2021-12-21T08:44:05Z) - Differentially Private Federated Learning with Laplacian Smoothing [72.85272874099644]
Federated learning aims to protect data privacy by collaboratively learning a model without sharing private data among users.
An adversary may still be able to infer the private training data by attacking the released model.
Differential privacy provides a statistical protection against such attacks at the price of significantly degrading the accuracy or utility of the trained models.
arXiv Detail & Related papers (2020-05-01T04:28:38Z) - Privacy-preserving Traffic Flow Prediction: A Federated Learning
Approach [61.64006416975458]
We propose a privacy-preserving machine learning technique named Federated Learning-based Gated Recurrent Unit neural network algorithm (FedGRU) for traffic flow prediction.
FedGRU differs from current centralized learning methods and updates universal learning models through a secure parameter aggregation mechanism.
It is shown that FedGRU's prediction accuracy is 90.96% higher than the advanced deep learning models.
arXiv Detail & Related papers (2020-03-19T13:07:49Z) - CryptoSPN: Privacy-preserving Sum-Product Network Inference [84.88362774693914]
We present a framework for privacy-preserving inference of sum-product networks (SPNs)
CryptoSPN achieves highly efficient and accurate inference in the order of seconds for medium-sized SPNs.
arXiv Detail & Related papers (2020-02-03T14:49:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.