Related papers: An Optimized Machine Learning Classifier for Detecting Fake Reviews Using Extracted Features

An Optimized Machine Learning Classifier for Detecting Fake Reviews Using Extracted Features

URL: http://arxiv.org/abs/2511.21716v1
Date: Wed, 19 Nov 2025 10:05:23 GMT
Title: An Optimized Machine Learning Classifier for Detecting Fake Reviews Using Extracted Features
Authors: Shabbir Anees, Anshuman, Ayush Chaurasia, Prathmesh Bogar,
Abstract summary: We present an advanced machine-learning-based system that analyses these reviews produced by AI with remarkable precision.<n>Our method integrates advanced text preprocessing, multi-modal feature extraction, Harris Hawks Optimization, and a stacking ensemble classifier.<n>Our final stacking model achieved 95.40% accuracy, 92.81% precision, 95.01% recall, and a 93.90% F1-Score, which demonstrates that the combination of ensemble learning and bio-inspired optimisation is an effective method for machine-generated text recognition.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: It is well known that fraudulent reviews cast doubt on the legitimacy and dependability of online purchases. The most recent development that leads customers towards darkness is the appearance of human reviews in computer-generated (CG) ones. In this work, we present an advanced machine-learning-based system that analyses these reviews produced by AI with remarkable precision. Our method integrates advanced text preprocessing, multi-modal feature extraction, Harris Hawks Optimization (HHO) for feature selection, and a stacking ensemble classifier. We implemented this methodology on a public dataset of 40,432 Original (OR) and Computer-Generated (CG) reviews. From an initial set of 13,539 features, HHO selected the most applicable 1,368 features, achieving an 89.9% dimensionality reduction. Our final stacking model achieved 95.40% accuracy, 92.81% precision, 95.01% recall, and a 93.90% F1-Score, which demonstrates that the combination of ensemble learning and bio-inspired optimisation is an effective method for machine-generated text recognition. Because large-scale review analytics commonly run on cloud platforms, privacy-preserving techniques such as differential approaches and secure outsourcing are essential to protect user data in these systems.

Related papers

Phishing Detection System: An Ensemble Approach Using Character-Level CNN and Feature Engineering [0.0]
This paper presents an AI model for a phishing detection system.<n>It uses an ensemble approach to combine character-level Convolutional Neural Networks (CNN) and LightGBM with engineered features.<n>On a test dataset of 19,873 URLs, the ensemble model achieves an accuracy of 99.819 percent, precision of 100 percent, recall of 99.635 percent, and ROC-AUC of 99.947 percent.
arXiv Detail & Related papers (2025-12-18T16:19:12Z)
Adaptive Malware Detection using Sequential Feature Selection: A Dueling Double Deep Q-Network (D3QN) Framework for Intelligent Classification [1.4120905648647635]
We formulate malware classification as a Markov Decision Process with episodic feature acquisition.<n>We propose a Dueling Double Deep Q-Network (D3QN) framework for adaptive sequential feature selection.<n>We evaluate our approach on Microsoft Big2015 (9-class, 1,795 features) and BODMAS (binary, 2,381 features) datasets.
arXiv Detail & Related papers (2025-07-06T12:37:50Z)
Towards Trustworthy Keylogger detection: A Comprehensive Analysis of Ensemble Techniques and Feature Selections through Explainable AI [0.0]
Keylogger detection involves monitoring for unusual system behaviors such as delays between typing and character display.<n>In this study, we provide a comprehensive analysis for keylogger detection with traditional machine learning models.
arXiv Detail & Related papers (2025-05-22T01:04:13Z)
Evaluation of Artificial Intelligence Methods for Lead Time Prediction in Non-Cycled Areas of Automotive Production [1.3499500088995464]
The present study examines the effectiveness of applying Artificial Intelligence methods in an automotive production environment.<n>Data structures are analyzed to identify contextual features and then preprocessed using one-hot encoding.<n>The research demonstrates that AI methods can be effectively applied to highly variable production data, adding business value.
arXiv Detail & Related papers (2025-01-13T13:28:03Z)
A Hybrid Framework for Statistical Feature Selection and Image-Based Noise-Defect Detection [55.2480439325792]
This paper presents a hybrid framework that integrates both statistical feature selection and classification techniques to improve defect detection accuracy.<n>We present around 55 distinguished features that are extracted from industrial images, which are then analyzed using statistical methods.<n>By integrating these methods with flexible machine learning applications, the proposed framework improves detection accuracy and reduces false positives and misclassifications.
arXiv Detail & Related papers (2024-12-11T22:12:21Z)
Transforming In-Vehicle Network Intrusion Detection: VAE-based Knowledge Distillation Meets Explainable AI [0.0]
This paper introduces an advanced intrusion detection system (IDS) called KD-XVAE that uses a Variational Autoencoder (VAE)-based knowledge distillation approach. Our model significantly reduces complexity, operating with just 1669 parameters and achieving an inference time of 0.3 ms per batch.
arXiv Detail & Related papers (2024-10-11T17:57:16Z)
Bridging the Gap Between End-to-End and Two-Step Text Spotting [88.14552991115207]
Bridging Text Spotting is a novel approach that resolves the error accumulation and suboptimal performance issues in two-step methods. We demonstrate the effectiveness of the proposed method through extensive experiments.
arXiv Detail & Related papers (2024-04-06T13:14:04Z)
Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation [56.13803674092712]
We propose an industrial-friendly, expert-aligned and diversity-preserved instruction data selection method: Clustering and Ranking (CaR) CaR employs a two-step process: first, it ranks instruction pairs using a high-accuracy (84.25%) scoring model aligned with expert preferences; second, it preserves dataset diversity through clustering. In our experiment, CaR efficiently selected a mere 1.96% of Alpaca's IT data, yet the resulting AlpaCaR model surpassed Alpaca's performance by an average of 32.1% in GPT-4 evaluations.
arXiv Detail & Related papers (2024-02-28T09:27:29Z)
Evaluating Machine Unlearning via Epistemic Uncertainty [78.27542864367821]
This work presents an evaluation of Machine Unlearning algorithms based on uncertainty. This is the first definition of a general evaluation of our best knowledge.
arXiv Detail & Related papers (2022-08-23T09:37:31Z)
Compactness Score: A Fast Filter Method for Unsupervised Feature Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features. Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z)
Human-in-the-Loop Disinformation Detection: Stance, Sentiment, or Something Else? [93.91375268580806]
Both politics and pandemics have recently provided ample motivation for the development of machine learning-enabled disinformation (a.k.a. fake news) detection algorithms. Existing literature has focused primarily on the fully-automated case, but the resulting techniques cannot reliably detect disinformation on the varied topics, sources, and time scales required for military applications. By leveraging an already-available analyst as a human-in-the-loop, canonical machine learning techniques of sentiment analysis, aspect-based sentiment analysis, and stance detection become plausible methods to use for a partially-automated disinformation detection system.
arXiv Detail & Related papers (2021-11-09T13:30:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.