Related papers: Online detection and infographic explanation of spam reviews with data drift adaptation

Online detection and infographic explanation of spam reviews with data drift adaptation

URL: http://arxiv.org/abs/2406.15038v1
Date: Fri, 21 Jun 2024 10:35:46 GMT
Title: Online detection and infographic explanation of spam reviews with data drift adaptation
Authors: Francisco de Arriba-Pérez, Silvia García-Méndez, Fátima Leal, Benedita Malheiro, J. C. Burguillo,
Abstract summary: This paper proposes an online solution for identifying and explaining spam reviews, incorporating data drift adaptation. It integrates (i) incremental profiling, (ii) data drift detection & adaptation, and (iii) identification of spam reviews employing Machine Learning. The best results obtained reached up to 87 % spam F-measure.
Score: 4.278181795494584
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Spam reviews are a pervasive problem on online platforms due to its significant impact on reputation. However, research into spam detection in data streams is scarce. Another concern lies in their need for transparency. Consequently, this paper addresses those problems by proposing an online solution for identifying and explaining spam reviews, incorporating data drift adaptation. It integrates (i) incremental profiling, (ii) data drift detection & adaptation, and (iii) identification of spam reviews employing Machine Learning. The explainable mechanism displays a visual and textual prediction explanation in a dashboard. The best results obtained reached up to 87 % spam F-measure.

Related papers

SpaLLM-Guard: Pairing SMS Spam Detection Using Open-source and Commercial LLMs [1.3198171962008958]
We evaluate the potential of large language models (LLMs), both open-source and commercial, for SMS spam detection. We compare their performance across zero-shot, few-shot, fine-tuning, and chain-of-thought prompting approaches. Fine-tuning emerges as the most effective strategy, with Mixtral achieving 98.6% accuracy and a balanced false positive and false negative rate below 2%.
arXiv Detail & Related papers (2025-01-09T06:00:08Z)
TextSleuth: Towards Explainable Tampered Text Detection [49.88698441048043]
We propose to explain the basis of tampered text detection with natural language via large multimodal models. To fill the data gap for this task, we propose a large-scale, comprehensive dataset, ETTD. Elaborate queries are introduced to generate high-quality anomaly descriptions with GPT4o. To automatically filter out low-quality annotations, we also propose to prompt GPT4o to recognize tampered texts.
arXiv Detail & Related papers (2024-12-19T13:10:03Z)
A Systematic Review of Machine Learning Approaches for Detecting Deceptive Activities on Social Media: Methods, Challenges, and Biases [0.037693031068634524]
This systematic review evaluates studies that apply machine learning (ML) and deep learning (DL) models to detect fake news, spam, and fake accounts on social media.
arXiv Detail & Related papers (2024-10-26T23:55:50Z)
Methods for Generating Drift in Text Streams [49.3179290313959]
Concept drift is a frequent phenomenon in real-world datasets and corresponds to changes in data distribution over time. This paper provides four textual drift generation methods to ease the production of datasets with labeled drifts. Results show that all methods have their performance degraded right after the drifts, and the incremental SVM is the fastest to run and recover the previous performance levels.
arXiv Detail & Related papers (2024-03-18T23:48:33Z)
Rapid Adaptation in Online Continual Learning: Are We Evaluating It Right? [135.71855998537347]
We revisit the common practice of evaluating adaptation of Online Continual Learning (OCL) algorithms through the metric of online accuracy. We show that this metric is unreliable, as even vacuous blind classifiers can achieve unrealistically high online accuracy. Existing OCL algorithms can also achieve high online accuracy, but perform poorly in retaining useful information.
arXiv Detail & Related papers (2023-05-16T08:29:33Z)
Signed Latent Factors for Spamming Activity Detection [1.8275108630751844]
We propose a new attempt of utilizing signed latent factors to filter fraudulent activities. The spam-contaminated relational datasets of multiple online applications are interpreted by the unified signed network. Experiments on real-world datasets of different kinds of Web applications indicate that LFM models outperform state-of-the-art baselines in detecting spamming activities.
arXiv Detail & Related papers (2022-09-28T03:39:34Z)
Profiler: Profile-Based Model to Detect Phishing Emails [15.109679047753355]
We propose a multidimensional risk assessment of emails to reduce the feasibility of an attacker adapting their email and avoiding detection. We develop a risk assessment framework that includes three models which analyse an email's (1) threat level, (2) cognitive manipulation, and (3) email type. Our Profiler can be used in conjunction with ML approaches, to reduce their misclassifications or as a labeller for large email data sets in the training stage.
arXiv Detail & Related papers (2022-08-18T10:01:55Z)
Opinion Spam Detection: A New Approach Using Machine Learning and Network-Based Algorithms [2.062593640149623]
Online reviews play a crucial role in helping consumers evaluate and compare products and services. Fake reviews (opinion spam) are becoming more prevalent and negatively impacting customers and service providers. We propose a new method for classifying reviewers as spammers or benign, combining machine learning with a message-passing algorithm.
arXiv Detail & Related papers (2022-05-26T15:27:46Z)
Deep convolutional forest: a dynamic deep ensemble approach for spam detection in text [219.15486286590016]
This paper introduces a dynamic deep ensemble model for spam detection that adjusts its complexity and extracts features automatically. As a result, the model achieved high precision, recall, f1-score and accuracy of 98.38%.
arXiv Detail & Related papers (2021-10-10T17:19:37Z)
Leveraging GPT-2 for Classifying Spam Reviews with Limited Labeled Data via Adversarial Training [1.8899300124593648]
We propose an adversarial training mechanism for classifying opinion spam with limited labeled data and a large set of unlabeled data. Experiments on TripAdvisor and YelpZip datasets show that the proposed model outperforms state-of-the-art techniques by at least 7% in terms of accuracy when labeled data is limited.
arXiv Detail & Related papers (2020-12-24T18:59:51Z)
ScoreGAN: A Fraud Review Detector based on Multi Task Learning of Regulated GAN with Data Augmentation [50.779498955162644]
We propose ScoreGAN for fraud review detection that makes use of both review text and review rating scores in the generation and detection process. Results show that the proposed framework outperformed the existing state-of-the-art framework, namely FakeGAN, in terms of AP by 7%, and 5% on the Yelp and TripAdvisor datasets.
arXiv Detail & Related papers (2020-06-11T16:15:06Z)
Robust Spammer Detection by Nash Reinforcement Learning [64.80986064630025]
We develop a minimax game where the spammers and spam detectors compete with each other on their practical goals. We show that an optimization algorithm can reliably find an equilibrial detector that can robustly prevent spammers with any mixed spamming strategies from attaining their practical goal.
arXiv Detail & Related papers (2020-06-10T21:18:07Z)
DFraud3- Multi-Component Fraud Detection freeof Cold-start [50.779498955162644]
The Cold-start is a significant problem referring to the failure of a detection system to recognize the authenticity of a new user. In this paper, we model a review system as a Heterogeneous InformationNetwork (HIN) which enables a unique representation to every component. HIN with graph induction helps to address the camouflage issue (fraudsterswith genuine reviews) which has shown to be more severe when it is coupled with cold-start, i.e., new fraudsters with genuine first reviews.
arXiv Detail & Related papers (2020-06-10T08:20:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.