Metadata Integration for Spam Reviews Detection on Vietnamese E-commerce Websites
- URL: http://arxiv.org/abs/2405.13292v2
- Date: Thu, 1 Aug 2024 07:46:25 GMT
- Title: Metadata Integration for Spam Reviews Detection on Vietnamese E-commerce Websites
- Authors: Co Van Dinh, Son T. Luu,
- Abstract summary: We introduce the ViSpamReviews v2 dataset, which includes metadata of reviews.
We propose a novel approach to simultaneously integrate both textual and categorical attributes into the classification model.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The problem of detecting spam reviews (opinions) has received significant attention in recent years, especially with the rapid development of e-commerce. Spam reviews are often classified based on comment content, but in some cases, it is insufficient for models to accurately determine the review label. In this work, we introduce the ViSpamReviews v2 dataset, which includes metadata of reviews with the objective of integrating supplementary attributes for spam review classification. We propose a novel approach to simultaneously integrate both textual and categorical attributes into the classification model. In our experiments, the product category proved effective when combined with deep neural network (DNN) models, while text features performed well on both DNN models and the model achieved state-of-the-art performance in the problem of detecting spam reviews on Vietnamese e-commerce websites, namely PhoBERT. Specifically, the PhoBERT model achieves the highest accuracy when combined with product description features generated from the SPhoBert model, which is the combination of PhoBERT and SentenceBERT. Using the macro-averaged F1 score, the task of classifying spam reviews achieved 87.22% (an increase of 1.64% compared to the baseline), while the task of identifying the type of spam reviews achieved an accuracy of 73.49% (an increase of 1.93% compared to the baseline).
Related papers
- JPAVE: A Generation and Classification-based Model for Joint Product
Attribute Prediction and Value Extraction [59.94977231327573]
We propose a multi-task learning model with value generation/classification and attribute prediction called JPAVE.
Two variants of our model are designed for open-world and closed-world scenarios.
Experimental results on a public dataset demonstrate the superiority of our model compared with strong baselines.
arXiv Detail & Related papers (2023-11-07T18:36:16Z) - Opinion mining using Double Channel CNN for Recommender System [0.0]
We present an approach for sentiment analysis with a deep learning model and use it to recommend products.
A two-channel convolutional neural network model has been used for opinion mining, which has five layers and extracts essential features from the data.
Our proposed method has reached 91.6% accuracy, significantly improved compared to previous aspect-based approaches.
arXiv Detail & Related papers (2023-07-15T13:11:18Z) - Preserving Knowledge Invariance: Rethinking Robustness Evaluation of
Open Information Extraction [50.62245481416744]
We present the first benchmark that simulates the evaluation of open information extraction models in the real world.
We design and annotate a large-scale testbed in which each example is a knowledge-invariant clique.
By further elaborating the robustness metric, a model is judged to be robust if its performance is consistently accurate on the overall cliques.
arXiv Detail & Related papers (2023-05-23T12:05:09Z) - Verifying the Robustness of Automatic Credibility Assessment [79.08422736721764]
Text classification methods have been widely investigated as a way to detect content of low credibility.
In some cases insignificant changes in input text can mislead the models.
We introduce BODEGA: a benchmark for testing both victim models and attack methods on misinformation detection tasks.
arXiv Detail & Related papers (2023-03-14T16:11:47Z) - Towards Personalized Review Summarization by Modeling Historical Reviews
from Customer and Product Separately [59.61932899841944]
Review summarization is a non-trivial task that aims to summarize the main idea of the product review in the E-commerce website.
We propose the Heterogeneous Historical Review aware Review Summarization Model (HHRRS)
We employ a multi-task framework that conducts the review sentiment classification and summarization jointly.
arXiv Detail & Related papers (2023-01-27T12:32:55Z) - Detecting Spam Reviews on Vietnamese E-commerce Websites [0.0]
We propose the dataset called ViSpamReviews, which has a strict annotation procedure for detecting spam reviews on e-commerce platforms.
Our dataset consists of two tasks: the binary classification task for detecting whether a review is a spam or not and the multi-class classification task for identifying the type of spam.
The PhoBERT obtained the highest results on both tasks, 88.93% and 72.17%, respectively, by macro average F1 score.
arXiv Detail & Related papers (2022-07-27T10:37:14Z) - Leveraging GPT-2 for Classifying Spam Reviews with Limited Labeled Data
via Adversarial Training [1.8899300124593648]
We propose an adversarial training mechanism for classifying opinion spam with limited labeled data and a large set of unlabeled data.
Experiments on TripAdvisor and YelpZip datasets show that the proposed model outperforms state-of-the-art techniques by at least 7% in terms of accuracy when labeled data is limited.
arXiv Detail & Related papers (2020-12-24T18:59:51Z) - E-commerce Query-based Generation based on User Review [1.484852576248587]
We propose a novel seq2seq based text generation model to generate answers to user's question based on reviews posted by previous users.
Given a user question and/or target sentiment polarity, we extract aspects of interest and generate an answer that summarizes previous relevant user reviews.
arXiv Detail & Related papers (2020-11-11T04:58:31Z) - Evaluation Toolkit For Robustness Testing Of Automatic Essay Scoring
Systems [64.4896118325552]
We evaluate the current state-of-the-art AES models using a model adversarial evaluation scheme and associated metrics.
We find that AES models are highly overstable. Even heavy modifications(as much as 25%) with content unrelated to the topic of the questions do not decrease the score produced by the models.
arXiv Detail & Related papers (2020-07-14T03:49:43Z) - Automatic Validation of Textual Attribute Values in E-commerce Catalog
by Learning with Limited Labeled Data [61.789797281676606]
We propose a novel meta-learning latent variable approach, called MetaBridge.
It can learn transferable knowledge from a subset of categories with limited labeled data.
It can capture the uncertainty of never-seen categories with unlabeled data.
arXiv Detail & Related papers (2020-06-15T21:31:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.