Related papers: Metadata Integration for Spam Reviews Detection on Vietnamese E-commerce Websites

Metadata Integration for Spam Reviews Detection on Vietnamese E-commerce Websites

URL: http://arxiv.org/abs/2405.13292v2
Date: Thu, 1 Aug 2024 07:46:25 GMT
Title: Metadata Integration for Spam Reviews Detection on Vietnamese E-commerce Websites
Authors: Co Van Dinh, Son T. Luu,
Abstract summary: We introduce the ViSpamReviews v2 dataset, which includes metadata of reviews. We propose a novel approach to simultaneously integrate both textual and categorical attributes into the classification model.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The problem of detecting spam reviews (opinions) has received significant attention in recent years, especially with the rapid development of e-commerce. Spam reviews are often classified based on comment content, but in some cases, it is insufficient for models to accurately determine the review label. In this work, we introduce the ViSpamReviews v2 dataset, which includes metadata of reviews with the objective of integrating supplementary attributes for spam review classification. We propose a novel approach to simultaneously integrate both textual and categorical attributes into the classification model. In our experiments, the product category proved effective when combined with deep neural network (DNN) models, while text features performed well on both DNN models and the model achieved state-of-the-art performance in the problem of detecting spam reviews on Vietnamese e-commerce websites, namely PhoBERT. Specifically, the PhoBERT model achieves the highest accuracy when combined with product description features generated from the SPhoBert model, which is the combination of PhoBERT and SentenceBERT. Using the macro-averaged F1 score, the task of classifying spam reviews achieved 87.22% (an increase of 1.64% compared to the baseline), while the task of identifying the type of spam reviews achieved an accuracy of 73.49% (an increase of 1.93% compared to the baseline).

Related papers

Improving E-commerce Search with Category-Aligned Retrieval [0.0]
Category-Aligned Retrieval System (CARS) improves search relevance by first predicting the product category from a user's query and then boosting products within that category.<n>We introduce a novel method for creating "Trainable Category Prototypes" from query embeddings.
arXiv Detail & Related papers (2025-09-03T20:43:52Z)
Rethinking Metrics and Benchmarks of Video Anomaly Detection [58.37571339811799]
Video Anomaly Detection (VAD) aims to detect anomalies that deviate from expectation.<n>Existing VAD metrics are influenced by single annotation bias.<n>Existing benchmarks lack the capability to evaluate scene overfitting of fully/weakly-supervised algorithms.
arXiv Detail & Related papers (2025-05-25T08:09:42Z)
Evaluating Generative Language Models in Information Extraction as Subjective Question Correction [49.729908337372436]
We propose a new evaluation method, SQC-Score. Inspired by the principles in subjective question correction, we propose a new evaluation method, SQC-Score. Results on three information extraction tasks show that SQC-Score is more preferred by human annotators than the baseline metrics.
arXiv Detail & Related papers (2024-04-04T15:36:53Z)
JPAVE: A Generation and Classification-based Model for Joint Product Attribute Prediction and Value Extraction [59.94977231327573]
We propose a multi-task learning model with value generation/classification and attribute prediction called JPAVE. Two variants of our model are designed for open-world and closed-world scenarios. Experimental results on a public dataset demonstrate the superiority of our model compared with strong baselines.
arXiv Detail & Related papers (2023-11-07T18:36:16Z)
Opinion mining using Double Channel CNN for Recommender System [0.0]
We present an approach for sentiment analysis with a deep learning model and use it to recommend products. A two-channel convolutional neural network model has been used for opinion mining, which has five layers and extracts essential features from the data. Our proposed method has reached 91.6% accuracy, significantly improved compared to previous aspect-based approaches.
arXiv Detail & Related papers (2023-07-15T13:11:18Z)
Preserving Knowledge Invariance: Rethinking Robustness Evaluation of Open Information Extraction [50.62245481416744]
We present the first benchmark that simulates the evaluation of open information extraction models in the real world. We design and annotate a large-scale testbed in which each example is a knowledge-invariant clique. By further elaborating the robustness metric, a model is judged to be robust if its performance is consistently accurate on the overall cliques.
arXiv Detail & Related papers (2023-05-23T12:05:09Z)
Verifying the Robustness of Automatic Credibility Assessment [50.55687778699995]
We show that meaning-preserving changes in input text can mislead the models. We also introduce BODEGA: a benchmark for testing both victim models and attack methods on misinformation detection tasks. Our experimental results show that modern large language models are often more vulnerable to attacks than previous, smaller solutions.
arXiv Detail & Related papers (2023-03-14T16:11:47Z)
Towards Personalized Review Summarization by Modeling Historical Reviews from Customer and Product Separately [59.61932899841944]
Review summarization is a non-trivial task that aims to summarize the main idea of the product review in the E-commerce website. We propose the Heterogeneous Historical Review aware Review Summarization Model (HHRRS) We employ a multi-task framework that conducts the review sentiment classification and summarization jointly.
arXiv Detail & Related papers (2023-01-27T12:32:55Z)
Detecting Spam Reviews on Vietnamese E-commerce Websites [0.0]
We propose the dataset called ViSpamReviews, which has a strict annotation procedure for detecting spam reviews on e-commerce platforms. Our dataset consists of two tasks: the binary classification task for detecting whether a review is a spam or not and the multi-class classification task for identifying the type of spam. The PhoBERT obtained the highest results on both tasks, 88.93% and 72.17%, respectively, by macro average F1 score.
arXiv Detail & Related papers (2022-07-27T10:37:14Z)
Leveraging GPT-2 for Classifying Spam Reviews with Limited Labeled Data via Adversarial Training [1.8899300124593648]
We propose an adversarial training mechanism for classifying opinion spam with limited labeled data and a large set of unlabeled data. Experiments on TripAdvisor and YelpZip datasets show that the proposed model outperforms state-of-the-art techniques by at least 7% in terms of accuracy when labeled data is limited.
arXiv Detail & Related papers (2020-12-24T18:59:51Z)
E-commerce Query-based Generation based on User Review [1.484852576248587]
We propose a novel seq2seq based text generation model to generate answers to user's question based on reviews posted by previous users. Given a user question and/or target sentiment polarity, we extract aspects of interest and generate an answer that summarizes previous relevant user reviews.
arXiv Detail & Related papers (2020-11-11T04:58:31Z)
Evaluation Toolkit For Robustness Testing Of Automatic Essay Scoring Systems [64.4896118325552]
We evaluate the current state-of-the-art AES models using a model adversarial evaluation scheme and associated metrics. We find that AES models are highly overstable. Even heavy modifications(as much as 25%) with content unrelated to the topic of the questions do not decrease the score produced by the models.
arXiv Detail & Related papers (2020-07-14T03:49:43Z)
Automatic Validation of Textual Attribute Values in E-commerce Catalog by Learning with Limited Labeled Data [61.789797281676606]
We propose a novel meta-learning latent variable approach, called MetaBridge. It can learn transferable knowledge from a subset of categories with limited labeled data. It can capture the uncertainty of never-seen categories with unlabeled data.
arXiv Detail & Related papers (2020-06-15T21:31:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.