Opinion Spam Detection: A New Approach Using Machine Learning and
Network-Based Algorithms
- URL: http://arxiv.org/abs/2205.13422v1
- Date: Thu, 26 May 2022 15:27:46 GMT
- Title: Opinion Spam Detection: A New Approach Using Machine Learning and
Network-Based Algorithms
- Authors: Kiril Danilchenko, Michael Segal, Dan Vilenchik
- Abstract summary: Online reviews play a crucial role in helping consumers evaluate and compare products and services.
Fake reviews (opinion spam) are becoming more prevalent and negatively impacting customers and service providers.
We propose a new method for classifying reviewers as spammers or benign, combining machine learning with a message-passing algorithm.
- Score: 2.062593640149623
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: E-commerce is the fastest-growing segment of the economy. Online reviews play
a crucial role in helping consumers evaluate and compare products and services.
As a result, fake reviews (opinion spam) are becoming more prevalent and
negatively impacting customers and service providers. There are many reasons
why it is hard to identify opinion spammers automatically, including the
absence of reliable labeled data. This limitation precludes an off-the-shelf
application of a machine learning pipeline. We propose a new method for
classifying reviewers as spammers or benign, combining machine learning with a
message-passing algorithm that capitalizes on the users' graph structure to
compensate for the possible scarcity of labeled data. We devise a new way of
sampling the labels for the training step (active learning), replacing the
typical uniform sampling. Experiments on three large real-world datasets from
Yelp.com show that our method outperforms state-of-the-art active learning
approaches and also machine learning methods that use a much larger set of
labeled data for training.
Related papers
- Loss-Free Machine Unlearning [51.34904967046097]
We present a machine unlearning approach that is both retraining- and label-free.
Retraining-free approaches often utilise Fisher information, which is derived from the loss and requires labelled data which may not be available.
We present an extension to the Selective Synaptic Dampening algorithm, substituting the diagonal of the Fisher information matrix for the gradient of the l2 norm of the model output to approximate sensitivity.
arXiv Detail & Related papers (2024-02-29T16:15:34Z) - XAL: EXplainable Active Learning Makes Classifiers Better Low-resource Learners [71.8257151788923]
We propose a novel Explainable Active Learning framework (XAL) for low-resource text classification.
XAL encourages classifiers to justify their inferences and delve into unlabeled data for which they cannot provide reasonable explanations.
Experiments on six datasets show that XAL achieves consistent improvement over 9 strong baselines.
arXiv Detail & Related papers (2023-10-09T08:07:04Z) - Random Relabeling for Efficient Machine Unlearning [8.871042314510788]
Individuals' right to retract personal data and relevant data privacy regulations pose great challenges to machine learning.
We propose unlearning scheme random relabeling to efficiently deal with sequential data removal requests.
A less constraining removal certification method based on probability distribution similarity with naive unlearning is also proposed.
arXiv Detail & Related papers (2023-05-21T02:37:26Z) - Active Learning with Combinatorial Coverage [0.0]
Active learning is a practical field of machine learning that automates the process of selecting which data to label.
Current methods are effective in reducing the burden of data labeling but are heavily model-reliant.
This has led to the inability of sampled data to be transferred to new models as well as issues with sampling bias.
We propose active learning methods utilizing coverage to overcome these issues.
arXiv Detail & Related papers (2023-02-28T13:43:23Z) - Canary in a Coalmine: Better Membership Inference with Ensembled
Adversarial Queries [53.222218035435006]
We use adversarial tools to optimize for queries that are discriminative and diverse.
Our improvements achieve significantly more accurate membership inference than existing methods.
arXiv Detail & Related papers (2022-10-19T17:46:50Z) - A pipeline and comparative study of 12 machine learning models for text
classification [0.0]
Text-based communication is highly favoured as a communication method, especially in business environments.
Many machine learning methods for text classification have been proposed and incorporated into the services of most email providers.
However, optimising text classification algorithms and finding the right tradeoff on their aggressiveness is still a major research problem.
arXiv Detail & Related papers (2022-04-04T23:51:22Z) - FedSEAL: Semi-Supervised Federated Learning with Self-Ensemble Learning
and Negative Learning [7.771967424619346]
Federated learning (FL) is a popular decentralized and privacy-preserving machine learning (FL) framework.
In this paper, we propose a new FL algorithm, called FedSEAL, to solve this Semi-Supervised Federated Learning (SSFL) problem.
Our algorithm utilizes self-ensemble learning and complementary negative learning to enhance both the accuracy and the efficiency of clients' unsupervised learning on unlabeled data.
arXiv Detail & Related papers (2021-10-15T03:03:23Z) - Offline Learning from Demonstrations and Unlabeled Experience [62.928404936397335]
Behavior Imitation (BC) is often practical for robot learning because it allows a policy to be trained offline without rewards, by supervised learning on expert demonstrations.
This unlabeled data can be generated by a variety of sources such as human teleoperation, scripted policies and other agents on the same robot.
We show that Offline Reinforced Learning (ORIL) consistently outperforms comparable BC agents by effectively leveraging unlabeled experience.
arXiv Detail & Related papers (2020-11-27T18:20:04Z) - SLADE: A Self-Training Framework For Distance Metric Learning [75.54078592084217]
We present a self-training framework, SLADE, to improve retrieval performance by leveraging additional unlabeled data.
We first train a teacher model on the labeled data and use it to generate pseudo labels for the unlabeled data.
We then train a student model on both labels and pseudo labels to generate final feature embeddings.
arXiv Detail & Related papers (2020-11-20T08:26:10Z) - End-to-End Learning from Noisy Crowd to Supervised Machine Learning
Models [6.278267504352446]
We advocate using hybrid intelligence, i.e., combining deep models and human experts, to design an end-to-end learning framework from noisy crowd-sourced data.
We show how label aggregation can benefit from estimating the annotators' confusion matrix to improve the learning process.
We demonstrate the effectiveness of our strategies on several image datasets, using SVM and deep neural networks.
arXiv Detail & Related papers (2020-11-13T09:48:30Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.