Self-supervised Graph Representation Learning for Black Market Account
Detection
- URL: http://arxiv.org/abs/2212.02679v1
- Date: Tue, 6 Dec 2022 00:42:00 GMT
- Title: Self-supervised Graph Representation Learning for Black Market Account
Detection
- Authors: Zequan Xu, Lianyun Li, Hui Li, Qihang Sun, Shaofeng Hu, Rongrong Ji
- Abstract summary: Black market accounts (BMAs) are not directly involved in frauds and are more difficult to detect.
This paper illustrates our BMA detection system SGRL (Self-supervised Graph Learning) used in WeChat, a representative MMMA with over a billion users.
We deploy SGRL in the online environment to detect BMAs on the billion-scale WeChat graph, and it exceeds the alternative by 7.27% on the online evaluation measure.
- Score: 62.03978210281426
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Nowadays, Multi-purpose Messaging Mobile App (MMMA) has become increasingly
prevalent. MMMAs attract fraudsters and some cybercriminals provide support for
frauds via black market accounts (BMAs). Compared to fraudsters, BMAs are not
directly involved in frauds and are more difficult to detect. This paper
illustrates our BMA detection system SGRL (Self-supervised Graph Representation
Learning) used in WeChat, a representative MMMA with over a billion users. We
tailor Graph Neural Network and Graph Self-supervised Learning in SGRL for BMA
detection. The workflow of SGRL contains a pretraining phase that utilizes
structural information, node attribute information and available human
knowledge, and a lightweight detection phase. In offline experiments, SGRL
outperforms state-of-the-art methods by 16.06%-58.17% on offline evaluation
measures. We deploy SGRL in the online environment to detect BMAs on the
billion-scale WeChat graph, and it exceeds the alternative by 7.27% on the
online evaluation measure. In conclusion, SGRL can alleviate label reliance,
generalize well to unseen data, and effectively detect BMAs in WeChat.
Related papers
- EMMM, Explain Me My Model! Explainable Machine Generated Text Detection in Dialogues [18.279628627710107]
Current machine-generated text (MGT) detection methods struggle in online conversational settings.<n>In customer service scenarios where operators are typically non-expert users, explanation become crucial for trustworthy MGT detection.<n>We propose EMMM, an explanation-then-detection framework that balances latency, accuracy, and non-expert-oriented interpretability.
arXiv Detail & Related papers (2025-08-26T06:27:10Z) - GARG-AML against Smurfing: A Scalable and Interpretable Graph-Based Framework for Anti-Money Laundering [1.9461779294968458]
Money laundering is estimated to account for 2%-5% of the global GDP.<n>GARG-AML is a novel graph-based method that quantifies smurfing risk through a single interpretable metric.
arXiv Detail & Related papers (2025-06-04T11:30:37Z) - A Label-Free Heterophily-Guided Approach for Unsupervised Graph Fraud Detection [60.09453163562244]
We propose a Heterophily-guided Unsupervised Graph fraud dEtection approach (HUGE) for unsupervised GFD.
In the estimation module, we design a novel label-free heterophily metric called HALO, which captures the critical graph properties for GFD.
In the alignment-based fraud detection module, we develop a joint-GNN architecture with ranking loss and asymmetric alignment loss.
arXiv Detail & Related papers (2025-02-18T22:07:36Z) - Towards Realistic Evaluation of Commit Message Generation by Matching Online and Offline Settings [77.20838441870151]
Commit message generation is a crucial task in software engineering that is challenging to evaluate correctly.
We use an online metric - the number of edits users introduce before committing the generated messages to the VCS - to select metrics for offline experiments.
Our results indicate that edit distance exhibits the highest correlation, whereas commonly used similarity metrics such as BLEU and METEOR demonstrate low correlation.
arXiv Detail & Related papers (2024-10-15T20:32:07Z) - Detecting Training Data of Large Language Models via Expectation Maximization [62.28028046993391]
We introduce EM-MIA, a novel membership inference method that iteratively refines membership scores and prefix scores via an expectation-maximization algorithm.
EM-MIA achieves state-of-the-art results on WikiMIA.
arXiv Detail & Related papers (2024-10-10T03:31:16Z) - SignSGD with Federated Voting [69.06621279967865]
SignSGD with majority voting (signSGD-MV) is an effective distributed learning algorithm that can significantly reduce communication costs by one-bit quantization.
We propose a novel signSGD with textitfederated voting (signSGD-FV)
The idea of federated voting is to exploit learnable weights to perform weighted majority voting.
We demonstrate that the proposed signSGD-FV algorithm has a theoretical convergence guarantee even when edge devices use heterogeneous mini-batch sizes.
arXiv Detail & Related papers (2024-03-25T02:32:43Z) - M4GT-Bench: Evaluation Benchmark for Black-Box Machine-Generated Text Detection [69.41274756177336]
Large Language Models (LLMs) have brought an unprecedented surge in machine-generated text (MGT) across diverse channels.
This raises legitimate concerns about its potential misuse and societal implications.
We introduce a new benchmark based on a multilingual, multi-domain, and multi-generator corpus of MGTs -- M4GT-Bench.
arXiv Detail & Related papers (2024-02-17T02:50:33Z) - Multitask Active Learning for Graph Anomaly Detection [48.690169078479116]
We propose a novel MultItask acTIve Graph Anomaly deTEction framework, namely MITIGATE.
By coupling node classification tasks, MITIGATE obtains the capability to detect out-of-distribution nodes without known anomalies.
Empirical studies on four datasets demonstrate that MITIGATE significantly outperforms the state-of-the-art methods for anomaly detection.
arXiv Detail & Related papers (2024-01-24T03:43:45Z) - Crowdsourcing Fraud Detection over Heterogeneous Temporal MMMA Graph [5.448839082856454]
We propose a novel contrastive multi-view learning method named CMT for crowdsourcing fraud detection over the heterogeneous temporal graph (HTG) of MMMA.
We deploy CMT to detect crowdsourcing frauds on an industry-size HTG of a representative MMMA WeChat and it significantly outperforms other methods.
arXiv Detail & Related papers (2023-08-05T05:35:40Z) - MGTAB: A Multi-Relational Graph-Based Twitter Account Detection
Benchmark [14.91754326735955]
We propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first graph-based benchmark for account detection.
MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations.
Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations.
arXiv Detail & Related papers (2023-01-03T14:43:40Z) - Fraudulent User Detection Via Behavior Information Aggregation Network
(BIAN) On Large-Scale Financial Social Network [8.687460943376605]
We propose a novel behavior information aggregation network (BIAN) to combine the user behaviors with other user features.
The experimental results on a real-world large-scale financial social network dataset, DGraph, show that BIAN obtains the 10.2% gain in AUROC.
arXiv Detail & Related papers (2022-11-04T08:33:06Z) - LaundroGraph: Self-Supervised Graph Representation Learning for
Anti-Money Laundering [5.478764356647437]
LaundroGraph is a novel self-supervised graph representation learning approach.
It provides insights to assist the anti-money laundering reviewing process.
To the best of our knowledge, this is the first fully self-supervised system within the context of AML detection.
arXiv Detail & Related papers (2022-10-25T21:58:02Z) - Detect Professional Malicious User with Metric Learning in Recommender
Systems [39.26521260453495]
In e-commerce, online retailers are usually suffering from professional malicious users (PMUs), who utilize negative reviews and low ratings to threaten the retailers for illegal profits.
We propose an unsupervised multi-modal learning model: MMD, which employs Metric learning for professional Malicious users Detection with both ratings and reviews.
arXiv Detail & Related papers (2022-05-19T16:32:36Z) - Deep Fraud Detection on Non-attributed Graph [61.636677596161235]
Graph Neural Networks (GNNs) have shown solid performance on fraud detection.
labeled data is scarce in large-scale industrial problems, especially for fraud detection.
We propose a novel graph pre-training strategy to leverage more unlabeled data.
arXiv Detail & Related papers (2021-10-04T03:42:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.