Semi-Supervised Supply Chain Fraud Detection with Unsupervised Pre-Filtering
- URL: http://arxiv.org/abs/2508.06574v1
- Date: Thu, 07 Aug 2025 11:25:09 GMT
- Title: Semi-Supervised Supply Chain Fraud Detection with Unsupervised Pre-Filtering
- Authors: Fatemeh Moradi, Mehran Tarif, Mohammadhossein Homaei,
- Abstract summary: Fraud in modern supply chains is a growing challenge, driven by the complexity of global networks and the scarcity of labeled data.<n>Traditional detection methods often struggle with class imbalance and limited supervision, reducing their effectiveness in real-world applications.<n>This paper proposes a novel two-phase learning framework to address these challenges.<n>In the first phase, the Isolation Forest algorithm performs unsupervised anomaly detection to identify potential fraud cases and reduce the volume of data requiring further analysis.<n>In the second phase, a self-training Support Vector Machine (SVM) refines the predictions using both labeled and high-confidence pseudo-labeled samples, enabling robust
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Detecting fraud in modern supply chains is a growing challenge, driven by the complexity of global networks and the scarcity of labeled data. Traditional detection methods often struggle with class imbalance and limited supervision, reducing their effectiveness in real-world applications. This paper proposes a novel two-phase learning framework to address these challenges. In the first phase, the Isolation Forest algorithm performs unsupervised anomaly detection to identify potential fraud cases and reduce the volume of data requiring further analysis. In the second phase, a self-training Support Vector Machine (SVM) refines the predictions using both labeled and high-confidence pseudo-labeled samples, enabling robust semi-supervised learning. The proposed method is evaluated on the DataCo Smart Supply Chain Dataset, a comprehensive real-world supply chain dataset with fraud indicators. It achieves an F1-score of 0.817 while maintaining a false positive rate below 3.0%. These results demonstrate the effectiveness and efficiency of combining unsupervised pre-filtering with semi-supervised refinement for supply chain fraud detection under real-world constraints, though we acknowledge limitations regarding concept drift and the need for comparison with deep learning approaches.
Related papers
- Revisiting Multivariate Time Series Forecasting with Missing Values [65.30332997607141]
Missing values are common in real-world time series.<n>Current approaches have developed an imputation-then-prediction framework that uses imputation modules to fill in missing values, followed by forecasting on the imputed data.<n>This framework overlooks a critical issue: there is no ground truth for the missing values, making the imputation process susceptible to errors that can degrade prediction accuracy.<n>We introduce Consistency-Regularized Information Bottleneck (CRIB), a novel framework built on the Information Bottleneck principle.
arXiv Detail & Related papers (2025-09-27T20:57:48Z) - Adversarial Augmentation and Active Sampling for Robust Cyber Anomaly Detection [1.102914654802229]
Advanced Persistent Threats (APTs) present a considerable challenge to cybersecurity due to their stealthy, long-duration nature.<n>Traditional supervised learning methods typically require large amounts of labeled data, which is often scarce in real-world scenarios.<n>This paper introduces a novel approach that combines AutoEncoders for anomaly detection with active learning to iteratively enhance APT detection.
arXiv Detail & Related papers (2025-09-05T10:47:49Z) - Contrastive-KAN: A Semi-Supervised Intrusion Detection Framework for Cybersecurity with scarce Labeled Data [0.0]
We propose a real-time intrusion detection system based on a semi-supervised contrastive learning framework using the Kolmogorov-Arnold Network (KAN)<n>Our method leverages abundant unlabeled data to effectively distinguish between normal and attack behaviors.<n>We validate our approach on three benchmark datasets, UNSW-NB15, BoT-IoT, and Gas Pipeline, using only 2.20%, 1.28%, and 8% of labeled samples, respectively.
arXiv Detail & Related papers (2025-07-14T21:02:34Z) - Leveraging Ensemble-Based Semi-Supervised Learning for Illicit Account Detection in Ethereum DeFi Transactions [0.0]
Decentralized Finance (DeFi) has introduced significant security risks, including the proliferation of illicit accounts.<n>Traditional detection methods are limited by the scarcity of labeled data and the evolving tactics of malicious actors.<n>We propose a novel Self-Learning Ensemble-based Illicit account Detection framework to address these challenges.
arXiv Detail & Related papers (2024-12-03T12:03:13Z) - AnomalyAID: Reliable Interpretation for Semi-supervised Network Anomaly Detection [8.776201861433133]
AnomalyAID aims to make the anomaly detection process interpretable and improve the reliability of interpretation results.<n>We propose a novel interpretation approach that leverages global and local interpreters to provide reliable explanations.<n>We design a new two-stage semi-supervised learning framework for network anomaly detection by aligning both stages' model predictions with special constraints.
arXiv Detail & Related papers (2024-11-18T05:39:00Z) - Revisiting Class Imbalance for End-to-end Semi-Supervised Object
Detection [1.6249267147413524]
Semi-supervised object detection (SSOD) has made significant progress with the development of pseudo-label-based end-to-end methods.
Many methods face challenges due to class imbalance, which hinders the effectiveness of the pseudo-label generator.
In this paper, we examine the root causes of low-quality pseudo-labels and present novel learning mechanisms to improve the label generation quality.
arXiv Detail & Related papers (2023-06-04T06:01:53Z) - MAPS: A Noise-Robust Progressive Learning Approach for Source-Free
Domain Adaptive Keypoint Detection [76.97324120775475]
Cross-domain keypoint detection methods always require accessing the source data during adaptation.
This paper considers source-free domain adaptive keypoint detection, where only the well-trained source model is provided to the target domain.
arXiv Detail & Related papers (2023-02-09T12:06:08Z) - Free Lunch for Generating Effective Outlier Supervision [46.37464572099351]
We propose an ultra-effective method to generate near-realistic outlier supervision.
Our proposed textttBayesAug significantly reduces the false positive rate over 12.50% compared with the previous schemes.
arXiv Detail & Related papers (2023-01-17T01:46:45Z) - Self-Supervised Training with Autoencoders for Visual Anomaly Detection [61.62861063776813]
We focus on a specific use case in anomaly detection where the distribution of normal samples is supported by a lower-dimensional manifold.
We adapt a self-supervised learning regime that exploits discriminative information during training but focuses on the submanifold of normal examples.
We achieve a new state-of-the-art result on the MVTec AD dataset -- a challenging benchmark for visual anomaly detection in the manufacturing domain.
arXiv Detail & Related papers (2022-06-23T14:16:30Z) - Efficient Global Robustness Certification of Neural Networks via
Interleaving Twin-Network Encoding [8.173681464694651]
We formulate the global robustness certification for neural networks with ReLU activation functions as a mixed-integer linear programming (MILP) problem.
Our approach includes a novel interleaving twin-network encoding scheme, where two copies of the neural network are encoded side-by-side.
A case study of closed-loop control safety verification is conducted, and demonstrates the importance and practicality of our approach.
arXiv Detail & Related papers (2022-03-26T19:23:37Z) - Towards Reducing Labeling Cost in Deep Object Detection [61.010693873330446]
We propose a unified framework for active learning, that considers both the uncertainty and the robustness of the detector.
Our method is able to pseudo-label the very confident predictions, suppressing a potential distribution drift.
arXiv Detail & Related papers (2021-06-22T16:53:09Z) - WSSOD: A New Pipeline for Weakly- and Semi-Supervised Object Detection [75.80075054706079]
We propose a weakly- and semi-supervised object detection framework (WSSOD)
An agent detector is first trained on a joint dataset and then used to predict pseudo bounding boxes on weakly-annotated images.
The proposed framework demonstrates remarkable performance on PASCAL-VOC and MSCOCO benchmark, achieving a high performance comparable to those obtained in fully-supervised settings.
arXiv Detail & Related papers (2021-05-21T11:58:50Z) - Semi-supervised Long-tailed Recognition using Alternate Sampling [95.93760490301395]
Main challenges in long-tailed recognition come from the imbalanced data distribution and sample scarcity in its tail classes.
We propose a new recognition setting, namely semi-supervised long-tailed recognition.
We demonstrate significant accuracy improvements over other competitive methods on two datasets.
arXiv Detail & Related papers (2021-05-01T00:43:38Z) - Learning to Count in the Crowd from Limited Labeled Data [109.2954525909007]
We focus on reducing the annotation efforts by learning to count in the crowd from limited number of labeled samples.
Specifically, we propose a Gaussian Process-based iterative learning mechanism that involves estimation of pseudo-ground truth for the unlabeled data.
arXiv Detail & Related papers (2020-07-07T04:17:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.