Related papers: Fake detection in imbalance dataset by Semi-supervised learning with GAN

Fake detection in imbalance dataset by Semi-supervised learning with GAN

URL: http://arxiv.org/abs/2212.01071v5
Date: Wed, 20 Dec 2023 08:18:14 GMT
Title: Fake detection in imbalance dataset by Semi-supervised learning with GAN
Authors: Jinus Bordbar, Saman Ardalan, Mohammadreza Mohammadrezaie, Zahra Ghasemi
Abstract summary: Our study contributes to the field by achieving an 81% accuracy in detecting fake accounts using only 100 labeled samples. This demonstrates the potential of SGAN as a powerful tool for handling minority classes and addressing big data challenges in fake account detection.
Score: 1.4542411354617986
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As social media continues to grow rapidly, the prevalence of harassment on these platforms has also increased. This has piqued the interest of researchers in the field of fake detection. Social media data, often forms complex graphs with numerous nodes, posing several challenges. These challenges and limitations include dealing with a significant amount of irrelevant features in matrices and addressing issues such as high data dispersion and an imbalanced class distribution within the dataset. To overcome these challenges and limitations, researchers have employed auto-encoders and a combination of semi-supervised learning with a GAN algorithm, referred to as SGAN. Our proposed method utilizes auto-encoders for feature extraction and incorporates SGAN. By leveraging an unlabeled dataset, the unsupervised layer of SGAN compensates for the limited availability of labeled data, making efficient use of the limited number of labeled instances. Multiple evaluation metrics were employed, including the Confusion Matrix and the ROC curve. The dataset was divided into training and testing sets, with 100 labeled samples for training and 1,000 samples for testing. The novelty of our research lies in applying SGAN to address the issue of imbalanced datasets in fake account detection. By optimizing the use of a smaller number of labeled instances and reducing the need for extensive computational power, our method offers a more efficient solution. Additionally, our study contributes to the field by achieving an 81% accuracy in detecting fake accounts using only 100 labeled samples. This demonstrates the potential of SGAN as a powerful tool for handling minority classes and addressing big data challenges in fake account detection.

Related papers

A Conditional GAN for Tabular Data Generation with Probabilistic Sampling of Latent Subspaces [3.038642416291856]
We present ctdGAN, a conditional GAN for alleviating class imbalance in datasets.<n>ctdGAN executes a space partitioning step to assign cluster labels to the input samples.<n>It then utilizes these labels to synthesize samples via a novel probabilistic sampling strategy.<n>In this way, ctdGAN is trained to generate samples in subspaces that resemble those of the original data distribution.
arXiv Detail & Related papers (2025-08-01T09:49:57Z)
Few-Shot Learning with Adaptive Weight Masking in Conditional GANs [2.4299671488193497]
This paper introduces a novel approach to few-shot learning by employing a Residual Weight Masking Conditional Generative Adversarial Network (RWM-CGAN) for data augmentation. The proposed model integrates residual units within the generator to enhance network depth and sample quality, coupled with a weight mask regularization technique in the discriminator to improve feature learning from small-sample categories.
arXiv Detail & Related papers (2024-12-04T08:10:48Z)
Conditional Semi-Supervised Data Augmentation for Spam Message Detection with Low Resource Data [0.0]
We propose a conditional semi-supervised data augmentation for a spam detection model lacking the availability of data. We exploit unlabeled data for data augmentation to extend training data. Latent variables can come from labeled and unlabeled data as the input for the final classifier.
arXiv Detail & Related papers (2024-07-06T07:51:24Z)
Advancing Anomaly Detection: Non-Semantic Financial Data Encoding with LLMs [49.57641083688934]
We introduce a novel approach to anomaly detection in financial data using Large Language Models (LLMs) embeddings. Our experiments demonstrate that LLMs contribute valuable information to anomaly detection as our models outperform the baselines.
arXiv Detail & Related papers (2024-06-05T20:19:09Z)
Empowering HWNs with Efficient Data Labeling: A Clustered Federated Semi-Supervised Learning Approach [2.046985601687158]
Clustered Federated Multitask Learning (CFL) has gained considerable attention as an effective strategy for overcoming statistical challenges. We introduce a novel framework, Clustered Federated Semi-Supervised Learning (CFSL), designed for more realistic HWN scenarios. Our results demonstrate that CFSL significantly improves upon key metrics such as testing accuracy, labeling accuracy, and labeling latency under varying proportions of labeled and unlabeled data.
arXiv Detail & Related papers (2024-01-19T11:47:49Z)
Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers. We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes. We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z)
Semi-supervised binary classification with latent distance learning [0.0]
We propose a new learning representation to solve the binary classification problem using a few labels with a random k-pair cross-distance learning mechanism. With few labels and without any data augmentation techniques, the proposed method outperformed state-of-the-art semi-supervised and self-supervised learning methods.
arXiv Detail & Related papers (2022-11-28T09:05:26Z)
Is margin all you need? An extensive empirical study of active learning on tabular data [66.18464006872345]
We analyze the performance of a variety of active learning algorithms on 69 real-world datasets from the OpenML-CC18 benchmark. Surprisingly, we find that the classical margin sampling technique matches or outperforms all others, including current state-of-art.
arXiv Detail & Related papers (2022-10-07T21:18:24Z)
Collaborative Intelligence Orchestration: Inconsistency-Based Fusion of Semi-Supervised Learning and Active Learning [60.26659373318915]
Active learning (AL) and semi-supervised learning (SSL) are two effective, but often isolated, means to alleviate the data-hungry problem. We propose an innovative Inconsistency-based virtual aDvErial algorithm to further investigate SSL-AL's potential superiority. Two real-world case studies visualize the practical industrial value of applying and deploying the proposed data sampling algorithm.
arXiv Detail & Related papers (2022-06-07T13:28:43Z)
Scale-Equivalent Distillation for Semi-Supervised Object Detection [57.59525453301374]
Recent Semi-Supervised Object Detection (SS-OD) methods are mainly based on self-training, generating hard pseudo-labels by a teacher model on unlabeled data as supervisory signals. We analyze the challenges these methods meet with the empirical experiment results. We introduce a novel approach, Scale-Equivalent Distillation (SED), which is a simple yet effective end-to-end knowledge distillation framework robust to large object size variance and class imbalance.
arXiv Detail & Related papers (2022-03-23T07:33:37Z)
Learning to Count in the Crowd from Limited Labeled Data [109.2954525909007]
We focus on reducing the annotation efforts by learning to count in the crowd from limited number of labeled samples. Specifically, we propose a Gaussian Process-based iterative learning mechanism that involves estimation of pseudo-ground truth for the unlabeled data.
arXiv Detail & Related papers (2020-07-07T04:17:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.