Related papers: Enhancing Granular Sentiment Classification with Chain-of-Thought Prompting in Large Language Models

Enhancing Granular Sentiment Classification with Chain-of-Thought Prompting in Large Language Models

URL: http://arxiv.org/abs/2505.04135v1
Date: Wed, 07 May 2025 05:13:15 GMT
Title: Enhancing Granular Sentiment Classification with Chain-of-Thought Prompting in Large Language Models
Authors: Vihaan Miriyala, Smrithi Bukkapatnam, Lavanya Prahallad,
Abstract summary: We explore the use of Chain-of-Thought (CoT) prompting with large language models (LLMs) to improve the accuracy of granular sentiment categorization in app store reviews.<n>We evaluated the effectiveness of CoT prompting versus simple prompting on 2000 Amazon app reviews by comparing each method's predictions to human judgements.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We explore the use of Chain-of-Thought (CoT) prompting with large language models (LLMs) to improve the accuracy of granular sentiment categorization in app store reviews. Traditional numeric and polarity-based ratings often fail to capture the nuanced sentiment embedded in user feedback. We evaluated the effectiveness of CoT prompting versus simple prompting on 2000 Amazon app reviews by comparing each method's predictions to human judgements. CoT prompting improved classification accuracy from 84% to 93% highlighting the benefit of explicit reasoning in enhancing sentiment analysis performance.

Related papers

RAG-Zeval: Towards Robust and Interpretable Evaluation on RAG Responses through End-to-End Rule-Guided Reasoning [64.46921169261852]
RAG-Zeval is a novel end-to-end framework that formulates faithfulness and correctness evaluation as a rule-guided reasoning task.<n>Our approach trains evaluators with reinforcement learning, facilitating compact models to generate comprehensive and sound assessments.<n>Experiments demonstrate RAG-Zeval's superior performance, achieving the strongest correlation with human judgments.
arXiv Detail & Related papers (2025-05-28T14:55:33Z)
CoKe: Customizable Fine-Grained Story Evaluation via Chain-of-Keyword Rationalization [90.15027447565427]
Chain of thought (CoT) generates free-text explanations that help guide a model's predictions.<n>Self-Consistency (SC) marginalizes predictions over multiple generated explanations.<n>We propose $textbfC$hain-$textbfo$f-$textbfKe$ywords (CoKe)
arXiv Detail & Related papers (2025-03-21T13:37:46Z)
Reasoning Beyond Bias: A Study on Counterfactual Prompting and Chain of Thought Reasoning [0.0]
We show that differences in learned regularities across answer options are predictive of model preferences and mirror human test-taking strategies. We introduce two novel methods: Counterfactual Prompting with Chain of Thought (CoT) and Counterfactual Prompting with Agnostically Primed CoT (APriCoT) Our results suggest that mitigating bias requires a "System-2" like process and that CoT reasoning is susceptible to confirmation bias under some prompting methodologies.
arXiv Detail & Related papers (2024-08-16T10:34:50Z)
Markovian Transformers for Informative Language Modeling [0.9642500063568188]
Chain-of-Thought (CoT) reasoning often fails to faithfully reflect a language model's underlying decision process.<n>We make CoT causally essential in a "Markovian" language model, factoring next-token prediction through an intermediate CoT and training it to predict future tokens independently of the original prompt.
arXiv Detail & Related papers (2024-04-29T17:36:58Z)
Gradient-Boosted Decision Tree for Listwise Context Model in Multimodal Review Helpfulness Prediction [40.09991896766369]
Multimodal Review Helpfulness Prediction aims to rank product reviews based on predicted helpfulness scores. We propose a listwise attention network that clearly captures the MRHP ranking context. We also propose gradient-boosted decision tree as the score predictor to efficaciously partition product reviews' representations.
arXiv Detail & Related papers (2023-05-22T03:31:00Z)
Self-Evaluation Guided Beam Search for Reasoning [61.523627290397556]
We introduce a stepwise self-evaluation mechanism to guide and calibrate the reasoning process of Large Language Model (LLM) We propose a decoding algorithm integrating the self-evaluation guidance via beam search. Our approach surpasses the corresponding Codex-backboned baselines in few-shot accuracy by $6.34%$, $9.56%$, and $5.46%$ on the GSM8K, AQuA, and StrategyQA.
arXiv Detail & Related papers (2023-05-01T02:37:59Z)
Language Model Classifier Aligns Better with Physician Word Sensitivity than XGBoost on Readmission Prediction [86.15787587540132]
We introduce sensitivity score, a metric that scrutinizes models' behaviors at the vocabulary level. Our experiments compare the decision-making logic of clinicians and classifiers based on rank correlations of sensitivity scores.
arXiv Detail & Related papers (2022-11-13T23:59:11Z)
TRScore: A Novel GPT-based Readability Scorer for ASR Segmentation and Punctuation model evaluation and selection [1.4720080476520687]
Punctuation and readability are key to readability in Automatic Speech Recognition. Human evaluation is expensive, time-consuming, and suffers from large inter-observer variability. We present TRScore, a novel readability measure using the GPT model to evaluate different segmentation and punctuation systems.
arXiv Detail & Related papers (2022-10-27T01:11:32Z)
Rethinking and Refining the Distinct Metric [61.213465863627476]
We refine the calculation of distinct scores by re-scaling the number of distinct tokens based on its expectation. We provide both empirical and theoretical evidence to show that our method effectively removes the biases exhibited in the original distinct score.
arXiv Detail & Related papers (2022-02-28T07:36:30Z)
SIFN: A Sentiment-aware Interactive Fusion Network for Review-based Item Recommendation [48.1799451277808]
We propose a Sentiment-aware Interactive Fusion Network (SIFN) for review-based item recommendation. We first encode user/item reviews via BERT and propose a light-weighted sentiment learner to extract semantic features of each review. Then, we propose a sentiment prediction task that guides the sentiment learner to extract sentiment-aware features via explicit sentiment labels.
arXiv Detail & Related papers (2021-08-18T08:04:38Z)
A Unified Dual-view Model for Review Summarization and Sentiment Classification with Inconsistency Loss [51.448615489097236]
Acquiring accurate summarization and sentiment from user reviews is an essential component of modern e-commerce platforms. We propose a novel dual-view model that jointly improves the performance of these two tasks. Experiment results on four real-world datasets from different domains demonstrate the effectiveness of our model.
arXiv Detail & Related papers (2020-06-02T13:34:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.