AMLgentex: Mobilizing Data-Driven Research to Combat Money Laundering
- URL: http://arxiv.org/abs/2506.13989v2
- Date: Thu, 25 Sep 2025 09:46:21 GMT
- Title: AMLgentex: Mobilizing Data-Driven Research to Combat Money Laundering
- Authors: Johan Östman, Edvin Callisen, Anton Chen, Kristiina Ausmees, Emanuel Gårdh, Jovan Zamac, Jolanta Goldsteine, Hugo Wefer, Simon Whelan, Markus Reimegård,
- Abstract summary: Money laundering enables organized crime by moving illicit funds into the legitimate economy.<n> detection rates remain low because launderers evade oversight, confirmed cases are rare, and institutions see only fragments of the global transaction network.
- Score: 1.6830247282478483
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Money laundering enables organized crime by moving illicit funds into the legitimate economy. Although trillions of dollars are laundered each year, detection rates remain low because launderers evade oversight, confirmed cases are rare, and institutions see only fragments of the global transaction network. Since access to real transaction data is tightly restricted, synthetic datasets are essential for developing and evaluating detection methods. However, existing datasets fall short: they often neglect partial observability, temporal dynamics, strategic behavior, uncertain labels, class imbalance, and network-level dependencies. We introduce AMLGentex, an open-source suite for generating realistic, configurable transaction data and benchmarking detection methods. AMLGentex enables systematic evaluation of anti-money laundering systems under conditions that mirror real-world challenges. By releasing multiple country-specific datasets and practical parameter guidance, we aim to empower researchers and practitioners and provide a common foundation for collaboration and progress in combating money laundering.
Related papers
- Contamination Detection for VLMs using Multi-Modal Semantic Perturbation [73.76465227729818]
Open-source Vision-Language Models (VLMs) have achieved state-of-the-art performance on benchmark tasks.<n>Pretraining corpora raise a critical concern for both practitioners and users: inflated performance due to test-set leakage.<n>We show that existing detection approaches either fail outright or exhibit inconsistent behavior.<n>We propose a novel simple yet effective detection method based on multi-modal semantic perturbation.
arXiv Detail & Related papers (2025-11-05T18:59:52Z) - RiskTagger: An LLM-based Agent for Automatic Annotation of Web3 Crypto Money Laundering Behaviors [65.80108147440863]
RiskTagger is a large-language-model-based agent for the automatic annotation of crypto laundering behaviors in Web3.<n>RiskTagger is designed to replace or complement human annotators by addressing three key challenges: extracting clues from complex unstructured reports, reasoning over multichain transaction paths, and producing auditor-friendly explanations.
arXiv Detail & Related papers (2025-10-12T08:54:28Z) - Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs [60.881609323604685]
Large Language Models (LLMs) accessed via black-box APIs introduce a trust challenge.<n>Users pay for services based on advertised model capabilities.<n> providers may covertly substitute the specified model with a cheaper, lower-quality alternative to reduce operational costs.<n>This lack of transparency undermines fairness, erodes trust, and complicates reliable benchmarking.
arXiv Detail & Related papers (2025-04-07T03:57:41Z) - Collaborative Value Function Estimation Under Model Mismatch: A Federated Temporal Difference Analysis [55.13545823385091]
Federated reinforcement learning (FedRL) enables collaborative learning while preserving data privacy by preventing direct data exchange between agents.<n>In real-world applications, each agent may experience slightly different transition dynamics, leading to inherent model mismatches.<n>We show that even moderate levels of information sharing significantly mitigate environment-specific errors.
arXiv Detail & Related papers (2025-03-21T18:06:28Z) - Deep Learning Approaches for Anti-Money Laundering on Mobile Transactions: Review, Framework, and Directions [51.43521977132062]
Money laundering is a financial crime that obscures the origin of illicit funds.<n>The proliferation of mobile payment platforms and smart IoT devices has significantly complicated anti-money laundering investigations.<n>This paper conducts a comprehensive review of deep learning solutions and the challenges associated with their use in AML.
arXiv Detail & Related papers (2025-03-13T05:19:44Z) - Towards Collaborative Anti-Money Laundering Among Financial Institutions [15.671365710671063]
Rule-based methods were first introduced and are still widely used in current detection systems.<n>In practice, money laundering activities usually span multiple financial institutions.<n>We propose the first algorithm that supports performing anti-money laundering over multiple institutions.
arXiv Detail & Related papers (2025-02-27T10:22:55Z) - FinTSB: A Comprehensive and Practical Benchmark for Financial Time Series Forecasting [58.70072722290475]
Financial time series (FinTS) record the behavior of human-brain-augmented decision-making.<n>FinTSB is a comprehensive and practical benchmark for financial time series forecasting.
arXiv Detail & Related papers (2025-02-26T05:19:16Z) - Beyond Static Datasets: A Behavior-Driven Entity-Specific Simulation to Overcome Data Scarcity and Train Effective Crypto Anti-Money Laundering Models [0.23020018305241333]
Money laundering is a key crime to be mitigated to also suspend the movement of funds from other illicit activities.<n>It is getting extremely difficult to identify money laundering in crypto transactions owing to many layering strategies available today.<n>In this paper, we propose behavior embedded entity-specific money laundering-like transaction simulation.
arXiv Detail & Related papers (2025-01-01T06:58:05Z) - Advancing Anomaly Detection: Non-Semantic Financial Data Encoding with LLMs [49.57641083688934]
We introduce a novel approach to anomaly detection in financial data using Large Language Models (LLMs) embeddings.
Our experiments demonstrate that LLMs contribute valuable information to anomaly detection as our models outperform the baselines.
arXiv Detail & Related papers (2024-06-05T20:19:09Z) - Starlit: Privacy-Preserving Federated Learning to Enhance Financial
Fraud Detection [2.436659710491562]
Federated Learning (FL) is a data-minimization approach enabling collaborative model training across diverse clients with local data.
State-of-the-art FL solutions to identify fraudulent financial transactions exhibit a subset of the following limitations.
We introduce Starlit, a novel scalable privacy-preserving FL mechanism that overcomes these limitations.
arXiv Detail & Related papers (2024-01-19T15:37:11Z) - Realistic Synthetic Financial Transactions for Anti-Money Laundering
Models [2.3802629107286046]
Money laundering is the movement of illicit funds to conceal their origins.
The UN estimates 2-5% of global GDP or $0.8 - $2.0 trillion dollars are laundered globally each year.
This paper contributes a synthetic financial transaction dataset generator and a set of synthetically generated AML datasets.
arXiv Detail & Related papers (2023-06-22T10:32:51Z) - A Robustness Analysis of Blind Source Separation [91.3755431537592]
Blind source separation (BSS) aims to recover an unobserved signal from its mixture $X=f(S)$ under the condition that the transformation $f$ is invertible but unknown.
We present a general framework for analysing such violations and quantifying their impact on the blind recovery of $S$ from $X$.
We show that a generic BSS-solution in response to general deviations from its defining structural assumptions can be profitably analysed in the form of explicit continuity guarantees.
arXiv Detail & Related papers (2023-03-17T16:30:51Z) - Catch Me If You Can: Semi-supervised Graph Learning for Spotting Money
Laundering [0.4159343412286401]
Money laundering is a process where criminals use financial services to move illegal money to untraceable destinations.
It is very crucial to identify such activities accurately and reliably in order to enforce an anti-money laundering (AML)
In this paper, we employ semi-supervised graph learning techniques on graphs of financial transactions in order to identify nodes involved in potential money laundering.
arXiv Detail & Related papers (2023-02-23T09:34:19Z) - Mechanisms that Incentivize Data Sharing in Federated Learning [90.74337749137432]
We show how a naive scheme leads to catastrophic levels of free-riding where the benefits of data sharing are completely eroded.
We then introduce accuracy shaping based mechanisms to maximize the amount of data generated by each agent.
arXiv Detail & Related papers (2022-07-10T22:36:52Z) - Fighting Money Laundering with Statistics and Machine Learning [95.42181254494287]
There is little scientific literature on statistical and machine learning methods for anti-money laundering.
We propose a unifying terminology with two central elements: (i) client risk profiling and (ii) suspicious behavior flagging.
arXiv Detail & Related papers (2022-01-11T21:31:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.