Leveraging Big Data Frameworks for Spam Detection in Amazon Reviews
- URL: http://arxiv.org/abs/2509.21579v1
- Date: Thu, 25 Sep 2025 20:56:13 GMT
- Title: Leveraging Big Data Frameworks for Spam Detection in Amazon Reviews
- Authors: Mst Eshita Khatun, Halima Akter, Tasnimul Rehan, Toufiq Ahmed,
- Abstract summary: This research employs advanced big data analytics and machine learning approaches on a substantial dataset of Amazon product reviews.<n>The primary objective is to detect and classify spam reviews accurately so that it enhances the authenticity of the review.<n> Logistic Regression achieves an accuracy of 90.35%, thus contributing to a more trustworthy and transparent online shopping environment.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this digital era, online shopping is common practice in our daily lives. Product reviews significantly influence consumer buying behavior and help establish buyer trust. However, the prevalence of fraudulent reviews undermines this trust by potentially misleading consumers and damaging the reputations of the sellers. This research addresses this pressing issue by employing advanced big data analytics and machine learning approaches on a substantial dataset of Amazon product reviews. The primary objective is to detect and classify spam reviews accurately so that it enhances the authenticity of the review. Using a scalable big data framework, we efficiently process and analyze a large scale of review data, extracting key features indicative of fraudulent behavior. Our study illustrates the utility of various machine learning classifiers in detecting spam reviews, with Logistic Regression achieving an accuracy of 90.35%, thus contributing to a more trustworthy and transparent online shopping environment.
Related papers
- LazyReview A Dataset for Uncovering Lazy Thinking in NLP Peer Reviews [74.87393214734114]
This work introduces LazyReview, a dataset of peer-review sentences annotated with fine-grained lazy thinking categories.<n>Large Language Models (LLMs) struggle to detect these instances in a zero-shot setting.<n> instruction-based fine-tuning on our dataset significantly boosts performance by 10-20 performance points.
arXiv Detail & Related papers (2025-04-15T10:07:33Z) - What Matters in Explanations: Towards Explainable Fake Review Detection Focusing on Transformers [45.55363754551388]
Customers' reviews and feedback play crucial role on e-commerce platforms like Amazon, Zalando, and eBay.
There is a prevailing concern that sellers often post fake or spam reviews to deceive potential customers and manipulate their opinions about a product.
We propose an explainable framework for detecting fake reviews with high precision in identifying fraudulent content with explanations.
arXiv Detail & Related papers (2024-07-24T13:26:02Z) - Unmasking Falsehoods in Reviews: An Exploration of NLP Techniques [0.0]
This research paper proposes a machine learning model to identify deceptive reviews.
To accomplish this, an n-gram model and max features are developed to effectively identify deceptive content.
The experimental results reveal that the passive aggressive classifier stands out among the various algorithms.
arXiv Detail & Related papers (2023-07-20T06:35:43Z) - Protecting User Privacy in Online Settings via Supervised Learning [69.38374877559423]
We design an intelligent approach to online privacy protection that leverages supervised learning.
By detecting and blocking data collection that might infringe on a user's privacy, we can restore a degree of digital privacy to the user.
arXiv Detail & Related papers (2023-04-06T05:20:16Z) - 5-Star Hotel Customer Satisfaction Analysis Using Hybrid Methodology [0.0]
Our research suggests a new way to find factors for customer satisfaction through review data.
Unlike many studies on customer satisfaction that have been conducted in the past, our research has a novelty of the thesis.
arXiv Detail & Related papers (2022-09-26T04:53:10Z) - Opinion Spam Detection: A New Approach Using Machine Learning and
Network-Based Algorithms [2.062593640149623]
Online reviews play a crucial role in helping consumers evaluate and compare products and services.
Fake reviews (opinion spam) are becoming more prevalent and negatively impacting customers and service providers.
We propose a new method for classifying reviewers as spammers or benign, combining machine learning with a message-passing algorithm.
arXiv Detail & Related papers (2022-05-26T15:27:46Z) - Characterization of Frequent Online Shoppers using Statistical Learning
with Sparsity [54.26540039514418]
This work reports a method to learn the shopping preferences of frequent shoppers to an online gift store by combining ideas from retail analytics and statistical learning with sparsity.
arXiv Detail & Related papers (2021-11-11T05:36:39Z) - Confounds and Overestimations in Fake Review Detection: Experimentally
Controlling for Product-Ownership and Data-Origin [1.658669052286989]
Two possible confounds are data-origin (i.e., the dataset is composed of more than one source) and productownership (i.e., reviews written by individuals who own or do not own the reviewed product).
Using an experimental design, we manipulate data-origin, product ownership, review polarity, and veracity.Supervised learning analysis suggests that review veracity (60.26 - 69.87%) is somewhat detectable but reviews additionally confounded with product-ownership (66.19 - 74.17%), or with data-origin(84.44 - 86.94%) are easier to classify.
arXiv Detail & Related papers (2021-10-28T14:04:03Z) - Improving Opinion Spam Detection by Cumulative Relative Frequency
Distribution [0.9176056742068814]
Various approaches have been proposed for detecting opinion spam in online reviews.
We re-engineered a set of effective features used for classifying opinion spam.
We show that the use of the distributional features is able to improve the performances of classifiers.
arXiv Detail & Related papers (2020-12-27T10:23:44Z) - Learning to Infer User Hidden States for Online Sequential Advertising [52.169666997331724]
We propose our Deep Intents Sequential Advertising (DISA) method to address these issues.
The key part of interpretability is to understand a consumer's purchase intent which is, however, unobservable (called hidden states)
arXiv Detail & Related papers (2020-09-03T05:12:26Z) - DFraud3- Multi-Component Fraud Detection freeof Cold-start [50.779498955162644]
The Cold-start is a significant problem referring to the failure of a detection system to recognize the authenticity of a new user.
In this paper, we model a review system as a Heterogeneous InformationNetwork (HIN) which enables a unique representation to every component.
HIN with graph induction helps to address the camouflage issue (fraudsterswith genuine reviews) which has shown to be more severe when it is coupled with cold-start, i.e., new fraudsters with genuine first reviews.
arXiv Detail & Related papers (2020-06-10T08:20:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.