Regularised Text Logistic Regression: Key Word Detection and Sentiment
Classification for Online Reviews
- URL: http://arxiv.org/abs/2009.04591v1
- Date: Wed, 9 Sep 2020 22:37:53 GMT
- Title: Regularised Text Logistic Regression: Key Word Detection and Sentiment
Classification for Online Reviews
- Authors: Ying Chen, Peng Liu, Chung Piaw Teo
- Abstract summary: We propose a Regularized Text Logistic regression model to perform text analytics and sentiment classification on unstructured text data.
We apply the RTL model to two online review datasets, Restaurant and Hotel, from TripAdvisor.
- Score: 8.036300326665538
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Online customer reviews have become important for managers and executives in
the hospitality and catering industry who wish to obtain a comprehensive
understanding of their customers' demands and expectations. We propose a
Regularized Text Logistic (RTL) regression model to perform text analytics and
sentiment classification on unstructured text data, which automatically
identifies a set of statistically significant and operationally insightful word
features, and achieves satisfactory predictive classification accuracy. We
apply the RTL model to two online review datasets, Restaurant and Hotel, from
TripAdvisor. Our results demonstrate satisfactory classification performance
compared with alternative classifiers with a highest true positive rate of
94.9%. Moreover, RTL identifies a small set of word features, corresponding to
3% for Restaurant and 20% for Hotel, which boosts working efficiency by
allowing managers to drill down into a much smaller set of important customer
reviews. We also develop the consistency, sparsity and oracle property of the
estimator.
Related papers
- Analysing Zero-Shot Readability-Controlled Sentence Simplification [54.09069745799918]
We investigate how different types of contextual information affect a model's ability to generate sentences with the desired readability.
Results show that all tested models struggle to simplify sentences due to models' limitations and characteristics of the source sentences.
Our experiments also highlight the need for better automatic evaluation metrics tailored to RCTS.
arXiv Detail & Related papers (2024-09-30T12:36:25Z) - TRIAGE: Characterizing and auditing training data for improved
regression [80.11415390605215]
We introduce TRIAGE, a novel data characterization framework tailored to regression tasks and compatible with a broad class of regressors.
TRIAGE utilizes conformal predictive distributions to provide a model-agnostic scoring method, the TRIAGE score.
We show that TRIAGE's characterization is consistent and highlight its utility to improve performance via data sculpting/filtering, in multiple regression settings.
arXiv Detail & Related papers (2023-10-29T10:31:59Z) - Optimizing Multi-Class Text Classification: A Diverse Stacking Ensemble
Framework Utilizing Transformers [0.0]
This study introduces a stacking ensemble-based multi-text classification method that leverages transformer models.
By combining multiple single transformers, including BERT, ELECTRA, and DistilBERT, an optimal predictive model is generated.
Experimental evaluations conducted on a real-world customer review dataset demonstrate the effectiveness and superiority of the proposed approach.
arXiv Detail & Related papers (2023-08-19T13:29:15Z) - Bring Your Own Data! Self-Supervised Evaluation for Large Language
Models [52.15056231665816]
We propose a framework for self-supervised evaluation of Large Language Models (LLMs)
We demonstrate self-supervised evaluation strategies for measuring closed-book knowledge, toxicity, and long-range context dependence.
We find strong correlations between self-supervised and human-supervised evaluations.
arXiv Detail & Related papers (2023-06-23T17:59:09Z) - Evaluating Factual Consistency of Texts with Semantic Role Labeling [3.1776833268555134]
We introduce SRLScore, a reference-free evaluation metric designed with text summarization in mind.
A final factuality score is computed by an adjustable scoring mechanism.
Correlation with human judgments on English summarization datasets shows that SRLScore is competitive with state-of-the-art methods.
arXiv Detail & Related papers (2023-05-22T17:59:42Z) - Retrieval-based Disentangled Representation Learning with Natural
Language Supervision [61.75109410513864]
We present Vocabulary Disentangled Retrieval (VDR), a retrieval-based framework that harnesses natural language as proxies of the underlying data variation to drive disentangled representation learning.
Our approach employ a bi-encoder model to represent both data and natural language in a vocabulary space, enabling the model to distinguish intrinsic dimensions that capture characteristics within data through its natural language counterpart, thus disentanglement.
arXiv Detail & Related papers (2022-12-15T10:20:42Z) - Efficient Few-Shot Fine-Tuning for Opinion Summarization [83.76460801568092]
Abstractive summarization models are typically pre-trained on large amounts of generic texts, then fine-tuned on tens or hundreds of thousands of annotated samples.
We show that a few-shot method based on adapters can easily store in-domain knowledge.
We show that this self-supervised adapter pre-training improves summary quality over standard fine-tuning by 2.0 and 1.3 ROUGE-L points on the Amazon and Yelp datasets.
arXiv Detail & Related papers (2022-05-04T16:38:37Z) - Classifying variety of customer's online engagement for churn prediction
with mixed-penalty logistic regression [0.0]
We provide new predictive analytics of customer churn rate based on a machine learning method that enhances the classification of logistic regression by adding a mixed penalty term.
We show the analytical properties of the proposed method and its computational advantage in this research.
arXiv Detail & Related papers (2021-05-17T08:40:34Z) - Improved Customer Transaction Classification using Semi-Supervised
Knowledge Distillation [0.0]
We propose a cost-effective transaction classification approach based on semi-supervision and knowledge distillation frameworks.
The approach identifies the category of a transaction using free text input given by the customer.
We use weak labelling and notice that the performance gains are similar to that of using human-annotated samples.
arXiv Detail & Related papers (2021-02-15T16:16:42Z) - Automatic Validation of Textual Attribute Values in E-commerce Catalog
by Learning with Limited Labeled Data [61.789797281676606]
We propose a novel meta-learning latent variable approach, called MetaBridge.
It can learn transferable knowledge from a subset of categories with limited labeled data.
It can capture the uncertainty of never-seen categories with unlabeled data.
arXiv Detail & Related papers (2020-06-15T21:31:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.