Fraud Detection Through Large-Scale Graph Clustering with Heterogeneous Link Transformation
- URL: http://arxiv.org/abs/2512.19061v1
- Date: Mon, 22 Dec 2025 05:59:13 GMT
- Title: Fraud Detection Through Large-Scale Graph Clustering with Heterogeneous Link Transformation
- Authors: Chi Liu,
- Abstract summary: Collaborative fraud, where multiple accounts coordinate to exploit online payment systems, poses significant challenges.<n>Traditional detection methods that rely solely on high-confidence identity links suffer from limited coverage.<n>We propose a novel graph-based fraud detection framework that addresses the challenge of large-scale heterogeneous graph clustering.
- Score: 3.4057438602175742
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Collaborative fraud, where multiple fraudulent accounts coordinate to exploit online payment systems, poses significant challenges due to the formation of complex network structures. Traditional detection methods that rely solely on high-confidence identity links suffer from limited coverage, while approaches using all available linkages often result in fragmented graphs with reduced clustering effectiveness. In this paper, we propose a novel graph-based fraud detection framework that addresses the challenge of large-scale heterogeneous graph clustering through a principled link transformation approach. Our method distinguishes between \emph{hard links} (high-confidence identity relationships such as phone numbers, credit cards, and national IDs) and \emph{soft links} (behavioral associations including device fingerprints, cookies, and IP addresses). We introduce a graph transformation technique that first identifies connected components via hard links, merges them into super-nodes, and then reconstructs a weighted soft-link graph amenable to efficient embedding and clustering. The transformed graph is processed using LINE (Large-scale Information Network Embedding) for representation learning, followed by HDBSCAN (Hierarchical Density-Based Spatial Clustering of Applications with Noise) for density-based cluster discovery. Experiments on a real-world payment platform dataset demonstrate that our approach achieves significant graph size reduction (from 25 million to 7.7 million nodes), doubles the detection coverage compared to hard-link-only baselines, and maintains high precision across identified fraud clusters. Our framework provides a scalable and practical solution for industrial-scale fraud detection systems.
Related papers
- BotTrans: A Multi-Source Graph Domain Adaptation Approach for Social Bot Detection [55.31623652907614]
We propose a multi-source graph domain adaptation model named textitBotTrans for detecting social bots.<n>We first leverage the labeling knowledge shared across multiple source networks to establish a cross-source-domain topology.<n>We then aggregate cross-domain neighbor information to enhance the discriminability of source node embeddings.
arXiv Detail & Related papers (2025-06-12T02:10:36Z) - Cluster Aware Graph Anomaly Detection [32.791460110557104]
We propose a cluster aware multi-view graph anomaly detection method, called CARE.<n>Our approach captures both local and global node affinities by augmenting the graph's adjacency matrix with the pseudo-label.<n>We show that the proposed similarity-guided loss is a variant of contrastive learning loss.
arXiv Detail & Related papers (2024-09-15T15:41:59Z) - T-GAE: Transferable Graph Autoencoder for Network Alignment [79.89704126746204]
T-GAE is a graph autoencoder framework that leverages transferability and stability of GNNs to achieve efficient network alignment without retraining.
Our experiments demonstrate that T-GAE outperforms the state-of-the-art optimization method and the best GNN approach by up to 38.7% and 50.8%, respectively.
arXiv Detail & Related papers (2023-10-05T02:58:29Z) - CONVERT:Contrastive Graph Clustering with Reliable Augmentation [110.46658439733106]
We propose a novel CONtrastiVe Graph ClustEring network with Reliable AugmenTation (CONVERT)
In our method, the data augmentations are processed by the proposed reversible perturb-recover network.
To further guarantee the reliability of semantics, a novel semantic loss is presented to constrain the network.
arXiv Detail & Related papers (2023-08-17T13:07:09Z) - BOURNE: Bootstrapped Self-supervised Learning Framework for Unified
Graph Anomaly Detection [50.26074811655596]
We propose a novel unified graph anomaly detection framework based on bootstrapped self-supervised learning (named BOURNE)
By swapping the context embeddings between nodes and edges, we enable the mutual detection of node and edge anomalies.
BOURNE can eliminate the need for negative sampling, thereby enhancing its efficiency in handling large graphs.
arXiv Detail & Related papers (2023-07-28T00:44:57Z) - Behavioral graph fraud detection in E-commerce [10.621640214806794]
We present a novel behavioral biometric based method to establish transaction linkings based on user behavioral similarities.
To our knowledge, this is the first time similarity based soft link has been used in graph embedding applications.
Our experiments show that embedding features learned from similarity based behavioral graph have achieved significant performance increase to the baseline fraud detection model.
arXiv Detail & Related papers (2022-10-13T12:47:09Z) - Deep Fraud Detection on Non-attributed Graph [61.636677596161235]
Graph Neural Networks (GNNs) have shown solid performance on fraud detection.
labeled data is scarce in large-scale industrial problems, especially for fraud detection.
We propose a novel graph pre-training strategy to leverage more unlabeled data.
arXiv Detail & Related papers (2021-10-04T03:42:09Z) - Relational Graph Neural Networks for Fraud Detection in a Super-App
environment [53.561797148529664]
We propose a framework of relational graph convolutional networks methods for fraudulent behaviour prevention in the financial services of a Super-App.
We use an interpretability algorithm for graph neural networks to determine the most important relations to the classification task of the users.
Our results show that there is an added value when considering models that take advantage of the alternative data of the Super-App and the interactions found in their high connectivity.
arXiv Detail & Related papers (2021-07-29T00:02:06Z) - Identifying Linked Fraudulent Activities Using GraphConvolution Network [0.0]
We present a novel approach to identify linked fraudulent activities using Graph Convolution Network (GCN)
GCNs learn similarities between fraudulent nodes to identify clusters of similar attempts and require much smaller dataset to learn.
Our results outperform label propagation community detection and supervised GBTs algorithms in terms of solution quality and time.
arXiv Detail & Related papers (2021-06-05T09:56:08Z) - Learning to Cluster Faces via Confidence and Connectivity Estimation [136.5291151775236]
We propose a fully learnable clustering framework without requiring a large number of overlapped subgraphs.
Our method significantly improves clustering accuracy and thus performance of the recognition models trained on top, yet it is an order of magnitude more efficient than existing supervised methods.
arXiv Detail & Related papers (2020-04-01T13:39:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.