Benchmarking ChatGPT and DeepSeek in April 2025: A Novel Dual Perspective Sentiment Analysis Using Lexicon-Based and Deep Learning Approaches
- URL: http://arxiv.org/abs/2509.19346v1
- Date: Tue, 16 Sep 2025 20:58:10 GMT
- Title: Benchmarking ChatGPT and DeepSeek in April 2025: A Novel Dual Perspective Sentiment Analysis Using Lexicon-Based and Deep Learning Approaches
- Authors: Maryam Mahdi Alhusseini, Mohammad-Reza Feizi-Derakhshi,
- Abstract summary: This study presents a novel dual-perspective approach to analyzing user reviews for ChatGPT and DeepSeek on the Google Play Store.<n>It integrates lexicon-based sentiment analysis (TextBlob) with deep learning classification models, including Convolutional Neural Networks (CNN) and Bidirectional Long Short Term Memory (Bi LSTM) Networks.
- Score: 1.4968127458030251
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This study presents a novel dual-perspective approach to analyzing user reviews for ChatGPT and DeepSeek on the Google Play Store, integrating lexicon-based sentiment analysis (TextBlob) with deep learning classification models, including Convolutional Neural Networks (CNN) and Bidirectional Long Short Term Memory (Bi LSTM) Networks. Unlike prior research, which focuses on either lexicon-based strategies or predictive deep learning models in isolation, this study conducts an extensive investigation into user satisfaction with Large Language Model (LLM) based applications. A Dataset of 4,000 authentic user reviews was collected, which were carefully preprocessed and subjected to oversampling to achieve balanced classes. The balanced test set of 1,700 Reviews were used for model testing. Results from the experiments reveal that ChatGPT received significantly more positive sentiment than DeepSeek. Furthermore, deep learning based classification demonstrated superior performance over lexicon analysis, with CNN outperforming Bi-LSTM by achieving 96.41 percent accuracy and near perfect classification of negative reviews, alongside high F1-scores for neutral and positive sentiments. This research sets a new methodological standard for measuring sentiment in LLM-based applications and provides practical insights for developers and researchers seeking to improve user-centric AI system design.
Related papers
- Revisiting Nearest Neighbor for Tabular Data: A Deep Tabular Baseline Two Decades Later [76.66498833720411]
We introduce a differentiable version of $K$-nearest neighbors (KNN) originally designed to learn a linear projection to capture semantic similarities between instances.<n>Surprisingly, our implementation of NCA using SGD and without dimensionality reduction already achieves decent performance on tabular data.<n>We conclude our paper by analyzing the factors behind these improvements, including loss functions, prediction strategies, and deep architectures.
arXiv Detail & Related papers (2024-07-03T16:38:57Z) - SOUL: Towards Sentiment and Opinion Understanding of Language [96.74878032417054]
We propose a new task called Sentiment and Opinion Understanding of Language (SOUL)
SOUL aims to evaluate sentiment understanding through two subtasks: Review (RC) and Justification Generation (JG)
arXiv Detail & Related papers (2023-10-27T06:48:48Z) - The Languini Kitchen: Enabling Language Modelling Research at Different
Scales of Compute [66.84421705029624]
We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours.
We pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length.
This work also provides two baseline models: a feed-forward model derived from the GPT-2 architecture and a recurrent model in the form of a novel LSTM with ten-fold throughput.
arXiv Detail & Related papers (2023-09-20T10:31:17Z) - Unveiling the Sentinels: Assessing AI Performance in Cybersecurity Peer
Review [4.081120388114928]
In the field of cybersecurity, the practice of double-blind peer review is the de-facto standard.
This paper touches on the holy grail of peer reviewing and aims to shed light on the performance of AI in reviewing for academic security conferences.
We investigate the predictability of reviewing outcomes by comparing the results obtained from human reviewers and machine-learning models.
arXiv Detail & Related papers (2023-09-11T13:51:40Z) - Adversarial Capsule Networks for Romanian Satire Detection and Sentiment
Analysis [0.13048920509133807]
Satire detection and sentiment analysis are intensively explored natural language processing tasks.
In languages with fewer research resources, an alternative is to produce artificial examples based on character-level adversarial processes.
In this work, we improve the well-known NLP models with adversarial training and capsule networks.
The proposed framework outperforms the existing methods for the two tasks, achieving up to 99.08% accuracy.
arXiv Detail & Related papers (2023-06-13T15:23:44Z) - Pre-trained Embeddings for Entity Resolution: An Experimental Analysis
[Experiment, Analysis & Benchmark] [65.11858854040544]
We perform a thorough experimental analysis of 12 popular language models over 17 established benchmark datasets.
First, we assess their vectorization overhead for converting all input entities into dense embeddings vectors.
Second, we investigate their blocking performance, performing a detailed scalability analysis, and comparing them with the state-of-the-art deep learning-based blocking method.
Third, we conclude with their relative performance for both supervised and unsupervised matching.
arXiv Detail & Related papers (2023-04-24T08:53:54Z) - UniASM: Binary Code Similarity Detection without Fine-tuning [2.2329530239800035]
We propose a novel rich-semantic function representation technique to ensure the model captures the intricate nuances of binary code.<n>We introduce the first UniLM-based binary code embedding model, named UniASM, which includes two newly designed training tasks.<n>The experimental results show that UniASM outperforms the state-of-the-art (SOTA) approaches on the evaluation datasets.
arXiv Detail & Related papers (2022-10-28T14:04:57Z) - A Large Scale Search Dataset for Unbiased Learning to Rank [51.97967284268577]
We introduce the Baidu-ULTR dataset for unbiased learning to rank.
It involves randomly sampled 1.2 billion searching sessions and 7,008 expert annotated queries.
It provides: (1) the original semantic feature and a pre-trained language model for easy usage; (2) sufficient display information such as position, displayed height, and displayed abstract; and (3) rich user feedback on search result pages (SERPs) like dwelling time.
arXiv Detail & Related papers (2022-07-07T02:37:25Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z) - Robustness Gym: Unifying the NLP Evaluation Landscape [91.80175115162218]
Deep neural networks are often brittle when deployed in real-world systems.
Recent research has focused on testing the robustness of such models.
We propose a solution in the form of Robustness Gym, a simple and evaluation toolkit.
arXiv Detail & Related papers (2021-01-13T02:37:54Z) - Sentiment Analysis for Sinhala Language using Deep Learning Techniques [1.0499611180329804]
This paper presents a much comprehensive study on the use of standard sequence models such as RNN, LSTM, Bi-LSTM, and capsule networks.
A data set of 15059 Sinhala news comments, annotated with these four classes and a corpus consists of 9.48 million tokens are publicly released.
arXiv Detail & Related papers (2020-11-14T12:02:30Z) - Bidirectional Encoder Representations from Transformers (BERT): A
sentiment analysis odyssey [0.0]
The study puts forth two key insights: (1) relative efficacy of four highly advanced and widely used sentiment analysis techniques; and (2) undisputed superiority of pre-trained advanced supervised deep learning BERT model in sentiment analysis from text data.
We use publicly available labeled corpora of 50,000 movie reviews originally posted on internet movie database (IMDB) for analysis using Sent WordNet lexicon, logistic regression, LSTM, and BERT.
arXiv Detail & Related papers (2020-07-02T14:23:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.