Combating Financial Crimes with Unsupervised Learning Techniques:
Clustering and Dimensionality Reduction for Anti-Money Laundering
- URL: http://arxiv.org/abs/2403.00777v1
- Date: Wed, 14 Feb 2024 17:31:29 GMT
- Title: Combating Financial Crimes with Unsupervised Learning Techniques:
Clustering and Dimensionality Reduction for Anti-Money Laundering
- Authors: Ahmed N. Bakry, Almohammady S. Alsharkawy, Mohamed S. Farag, and Kamal
R. Raslan
- Abstract summary: Anti-Money Laundering (AML) is a crucial task in ensuring the integrity of financial systems.
Unsupervised learning, particularly clustering, is a promising solution for this task.
In this paper, we investigate the effectiveness of combining clustering method agglomerative hierarchicalclustering with four dimensionality reduction techniques.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Anti-Money Laundering (AML) is a crucial task in ensuring the integrity of
financial systems. One keychallenge in AML is identifying high-risk groups
based on their behavior. Unsupervised learning, particularly clustering, is a
promising solution for this task. However, the use of hundreds of features
todescribe behavior results in a highdimensional dataset that negatively
impacts clustering performance.In this paper, we investigate the effectiveness
of combining clustering method agglomerative hierarchicalclustering with four
dimensionality reduction techniques -Independent Component Analysis (ICA),
andKernel Principal Component Analysis (KPCA), Singular Value Decomposition
(SVD), Locality Preserving Projections (LPP)- to overcome the issue of
high-dimensionality in AML data and improve clusteringresults. This study aims
to provide insights into the most effective way of reducing the dimensionality
ofAML data and enhance the accuracy of clustering-based AML systems. The
experimental results demonstrate that KPCA outperforms other dimension
reduction techniques when combined with agglomerativehierarchical clustering.
This superiority is observed in the majority of situations, as confirmed by
threedistinct validation indices.
Related papers
- ESMC: MLLM-Based Embedding Selection for Explainable Multiple Clustering [79.69917150582633]
Multi-modal large language models (MLLMs) can be leveraged to achieve user-driven clustering.<n>Our method first discovers that MLLMs' hidden states of text tokens are strongly related to the corresponding features.<n>We also employ a lightweight clustering head augmented with pseudo-label learning, significantly enhancing clustering accuracy.
arXiv Detail & Related papers (2025-11-30T04:36:51Z) - Decomposing Global AUC into Cluster-Level Contributions for Localized Model Diagnostics [1.104960878651584]
Area Under the ROC Curve (AUC) is a widely used performance metric for binary classifiers.<n>In high-stakes applications such as credit approval and fraud detection, these weaknesses can lead to financial risk or operational failures.<n>We introduce a formal decomposition of global AUC into intra- and inter-cluster components.
arXiv Detail & Related papers (2025-08-10T21:58:47Z) - Unsupervised Learning: Comparative Analysis of Clustering Techniques on High-Dimensional Data [0.29465623430708915]
This paper presents a comprehensive analysis of prominent clustering algorithms K-means, DBSCAN, and Spectral Clustering on high-dimensional datasets.
We introduce a novel evaluation framework that assesses clustering performance across multiple dimensionality reduction techniques.
arXiv Detail & Related papers (2025-03-29T20:38:04Z) - AdaptiveMDL-GenClust: A Robust Clustering Framework Integrating Normalized Mutual Information and Evolutionary Algorithms [0.0]
We introduce a robust clustering framework that integrates the Minimum Description Length (MDL) principle with a genetic optimization algorithm.
The framework begins with an ensemble clustering approach to generate an initial clustering solution, which is refined using MDL-guided evaluation functions and optimized through a genetic algorithm.
Experimental results demonstrate that our approach consistently outperforms traditional clustering methods, yielding higher accuracy, improved stability, and reduced bias.
arXiv Detail & Related papers (2024-11-26T20:26:14Z) - Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - A3S: A General Active Clustering Method with Pairwise Constraints [66.74627463101837]
A3S features strategic active clustering adjustment on the initial cluster result, which is obtained by an adaptive clustering algorithm.
In extensive experiments across diverse real-world datasets, A3S achieves desired results with significantly fewer human queries.
arXiv Detail & Related papers (2024-07-14T13:37:03Z) - Anti-Collapse Loss for Deep Metric Learning Based on Coding Rate Metric [99.19559537966538]
DML aims to learn a discriminative high-dimensional embedding space for downstream tasks like classification, clustering, and retrieval.
To maintain the structure of embedding space and avoid feature collapse, we propose a novel loss function called Anti-Collapse Loss.
Comprehensive experiments on benchmark datasets demonstrate that our proposed method outperforms existing state-of-the-art methods.
arXiv Detail & Related papers (2024-07-03T13:44:20Z) - Unfolding ADMM for Enhanced Subspace Clustering of Hyperspectral Images [43.152314090830174]
We introduce an innovative clustering architecture for hyperspectral images (HSI) by unfolding an iterative solver based on the Alternating Direction Method of Multipliers (ADMM) for sparse subspace clustering.
Our approach captures well the structural characteristics of HSI data by employing the K nearest neighbors algorithm as part of a structure preservation module.
arXiv Detail & Related papers (2024-04-10T15:51:46Z) - Sampling-enabled scalable manifold learning unveils discriminative cluster structure of high-dimensional data [8.507955301076633]
We propose a sampling-based Scalable manifold learning technique that enables Uniform and Discriminative Embedding, namely SUDE, for large-scale and high-dimensional data.<n>We empirically validated the effectiveness of SUDE on synthetic datasets and real-world benchmarks, and applied it to analyze single-cell data and detect anomalies in electrocardiogram (ECG) signals.
arXiv Detail & Related papers (2024-01-02T08:43:06Z) - Robust and Automatic Data Clustering: Dirichlet Process meets
Median-of-Means [18.3248037914529]
We present an efficient and automatic clustering technique by integrating the principles of model-based and centroid-based methodologies.
Statistical guarantees on the upper bound of clustering error suggest the advantages of our proposed method over existing state-of-the-art clustering algorithms.
arXiv Detail & Related papers (2023-11-26T19:01:15Z) - Dynamic Clustering and Cluster Contrastive Learning for Unsupervised
Person Re-identification [29.167783500369442]
Unsupervised Re-ID methods aim at learning robust and discriminative features from unlabeled data.
We propose a dynamic clustering and cluster contrastive learning (DCCC) method.
Experiments on several widely used public datasets validate the effectiveness of our proposed DCCC.
arXiv Detail & Related papers (2023-03-13T01:56:53Z) - Unified Multi-View Orthonormal Non-Negative Graph Based Clustering
Framework [74.25493157757943]
We formulate a novel clustering model, which exploits the non-negative feature property and incorporates the multi-view information into a unified joint learning framework.
We also explore, for the first time, the multi-model non-negative graph-based approach to clustering data based on deep features.
arXiv Detail & Related papers (2022-11-03T08:18:27Z) - Rethinking Clustering-Based Pseudo-Labeling for Unsupervised
Meta-Learning [146.11600461034746]
Method for unsupervised meta-learning, CACTUs, is a clustering-based approach with pseudo-labeling.
This approach is model-agnostic and can be combined with supervised algorithms to learn from unlabeled data.
We prove that the core reason for this is lack of a clustering-friendly property in the embedding space.
arXiv Detail & Related papers (2022-09-27T19:04:36Z) - Cluster Analysis with Deep Embeddings and Contrastive Learning [0.0]
This work proposes a novel framework for performing image clustering from deep embeddings.
Our approach jointly learns representations and predicts cluster centers in an end-to-end manner.
Our framework performs on par with widely accepted clustering methods and outperforms the state-of-the-art contrastive learning method on the CIFAR-10 dataset.
arXiv Detail & Related papers (2021-09-26T22:18:15Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.