Confounds and Overestimations in Fake Review Detection: Experimentally
Controlling for Product-Ownership and Data-Origin
- URL: http://arxiv.org/abs/2110.15130v1
- Date: Thu, 28 Oct 2021 14:04:03 GMT
- Title: Confounds and Overestimations in Fake Review Detection: Experimentally
Controlling for Product-Ownership and Data-Origin
- Authors: Felix Soldner, Bennett Kleinberg, Shane Johnson
- Abstract summary: Two possible confounds are data-origin (i.e., the dataset is composed of more than one source) and productownership (i.e., reviews written by individuals who own or do not own the reviewed product).
Using an experimental design, we manipulate data-origin, product ownership, review polarity, and veracity.Supervised learning analysis suggests that review veracity (60.26 - 69.87%) is somewhat detectable but reviews additionally confounded with product-ownership (66.19 - 74.17%), or with data-origin(84.44 - 86.94%) are easier to classify.
- Score: 1.658669052286989
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The popularity of online shopping is steadily increasing. At the same time,
fake product reviewsare published widely and have the potential to affect
consumer purchasing behavior. In response,previous work has developed automated
methods for the detection of deceptive product reviews.However, studies vary
considerably in terms of classification performance, and many use data
thatcontain potential confounds, which makes it difficult to determine their
validity. Two possibleconfounds are data-origin (i.e., the dataset is composed
of more than one source) and productownership (i.e., reviews written by
individuals who own or do not own the reviewed product). Inthe present study,
we investigate the effect of both confounds for fake review detection. Using
anexperimental design, we manipulate data-origin, product ownership, review
polarity, and veracity.Supervised learning analysis suggests that review
veracity (60.26 - 69.87%) is somewhat detectablebut reviews additionally
confounded with product-ownership (66.19 - 74.17%), or with data-origin(84.44 -
86.94%) are easier to classify. Review veracity is most easily classified if
confounded withproduct-ownership and data-origin combined (87.78 - 88.12%),
suggesting overestimations of thetrue performance in other work. These findings
are moderated by review polarity.
Related papers
- Data Distribution Valuation [56.71023681599737]
Existing data valuation methods define a value for a discrete dataset.
In many use cases, users are interested in not only the value of the dataset, but that of the distribution from which the dataset was sampled.
We propose a maximum mean discrepancy (MMD)-based valuation method which enables theoretically principled and actionable policies.
arXiv Detail & Related papers (2024-10-06T07:56:53Z) - Analytical and Empirical Study of Herding Effects in Recommendation Systems [72.6693986712978]
We study how to manage product ratings via rating aggregation rules and shortlisted representative reviews.
We show that proper recency aware rating aggregation rules can improve the speed of convergence in Amazon and TripAdvisor.
arXiv Detail & Related papers (2024-08-20T14:29:23Z) - What Matters in Explanations: Towards Explainable Fake Review Detection Focusing on Transformers [45.55363754551388]
Customers' reviews and feedback play crucial role on e-commerce platforms like Amazon, Zalando, and eBay.
There is a prevailing concern that sellers often post fake or spam reviews to deceive potential customers and manipulate their opinions about a product.
We propose an explainable framework for detecting fake reviews with high precision in identifying fraudulent content with explanations.
arXiv Detail & Related papers (2024-07-24T13:26:02Z) - AiGen-FoodReview: A Multimodal Dataset of Machine-Generated Restaurant
Reviews and Images on Social Media [57.70351255180495]
AiGen-FoodReview is a dataset of 20,144 restaurant review-image pairs divided into authentic and machine-generated.
We explore unimodal and multimodal detection models, achieving 99.80% multimodal accuracy with FLAVA.
The paper contributes by open-sourcing the dataset and releasing fake review detectors, recommending its use in unimodal and multimodal fake review detection tasks, and evaluating linguistic and visual features in synthetic versus authentic data.
arXiv Detail & Related papers (2024-01-16T20:57:36Z) - Too Good To Be True: performance overestimation in (re)current practices
for Human Activity Recognition [49.1574468325115]
sliding windows for data segmentation followed by standard random k-fold cross validation produce biased results.
It is important to raise awareness in the scientific community about this problem, whose negative effects are being overlooked.
Several experiments with different types of datasets and different types of classification models allow us to exhibit the problem and show it persists independently of the method or dataset.
arXiv Detail & Related papers (2023-10-18T13:24:05Z) - Unmasking Falsehoods in Reviews: An Exploration of NLP Techniques [0.0]
This research paper proposes a machine learning model to identify deceptive reviews.
To accomplish this, an n-gram model and max features are developed to effectively identify deceptive content.
The experimental results reveal that the passive aggressive classifier stands out among the various algorithms.
arXiv Detail & Related papers (2023-07-20T06:35:43Z) - On the Role of Reviewer Expertise in Temporal Review Helpfulness
Prediction [5.381004207943597]
Existing methods for identifying helpful reviews primarily focus on review text and ignore the two key factors of (1) who post the reviews and (2) when the reviews are posted.
We introduce a dataset and develop a model that integrates the reviewer's expertise, derived from the past review history, and the temporal dynamics of the reviews to automatically assess review helpfulness.
arXiv Detail & Related papers (2023-02-22T23:41:22Z) - SentiLSTM: A Deep Learning Approach for Sentiment Analysis of Restaurant
Reviews [13.018530502810128]
This paper proposes, a deep learning-based technique (i.e., BiLSTM) to classify the reviews provided by the clients of the restaurant into positive and negative polarities.
The results of the evaluation on test dataset show that BiLSTM technique produced in the highest accuracy of 91.35%.
arXiv Detail & Related papers (2020-11-19T06:24:42Z) - ScoreGAN: A Fraud Review Detector based on Multi Task Learning of
Regulated GAN with Data Augmentation [50.779498955162644]
We propose ScoreGAN for fraud review detection that makes use of both review text and review rating scores in the generation and detection process.
Results show that the proposed framework outperformed the existing state-of-the-art framework, namely FakeGAN, in terms of AP by 7%, and 5% on the Yelp and TripAdvisor datasets.
arXiv Detail & Related papers (2020-06-11T16:15:06Z) - Context-aware Helpfulness Prediction for Online Product Reviews [34.47368084659301]
We propose a neural deep learning model that predicts the helpfulness score of a review.
This model is based on convolutional neural network (CNN) and a context-aware encoding mechanism.
We validated our model on human annotated dataset and the result shows that our model significantly outperforms existing models for helpfulness prediction.
arXiv Detail & Related papers (2020-04-27T18:19:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.