Related papers: CPC-CMS: Cognitive Pairwise Comparison Classification Model Selection Framework for Document-level Sentiment Analysis

CPC-CMS: Cognitive Pairwise Comparison Classification Model Selection Framework for Document-level Sentiment Analysis

URL: http://arxiv.org/abs/2507.14022v1
Date: Fri, 18 Jul 2025 15:41:53 GMT
Title: CPC-CMS: Cognitive Pairwise Comparison Classification Model Selection Framework for Document-level Sentiment Analysis
Authors: Jianfei Li, Kevin Kam Fung Yuen,
Abstract summary: This study proposes the Cognitive Pairwise Comparison Classification Model Selection ( CPC-CMS) framework for document-level sentiment analysis.<n>The CPC, based on expert knowledge judgment, is used to calculate the weights of evaluation criteria, including accuracy, precision, recall, F1-score, specificity, Matthews Correlation Coefficient (MCC), Cohen's Kappa (Kappa)<n>Three open datasets of social media are used to demonstrate the feasibility of the proposed CPC-CMS.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This study proposes the Cognitive Pairwise Comparison Classification Model Selection (CPC-CMS) framework for document-level sentiment analysis. The CPC, based on expert knowledge judgment, is used to calculate the weights of evaluation criteria, including accuracy, precision, recall, F1-score, specificity, Matthews Correlation Coefficient (MCC), Cohen's Kappa (Kappa), and efficiency. Naive Bayes, Linear Support Vector Classification (LSVC), Random Forest, Logistic Regression, Extreme Gradient Boosting (XGBoost), Long Short-Term Memory (LSTM), and A Lite Bidirectional Encoder Representations from Transformers (ALBERT) are chosen as classification baseline models. A weighted decision matrix consisting of classification evaluation scores with respect to criteria weights, is formed to select the best classification model for a classification problem. Three open datasets of social media are used to demonstrate the feasibility of the proposed CPC-CMS. Based on our simulation, for evaluation results excluding the time factor, ALBERT is the best for the three datasets; if time consumption is included, no single model always performs better than the other models. The CPC-CMS can be applied to the other classification applications in different areas.

Related papers

Area under the ROC Curve has the Most Consistent Evaluation for Binary Classification [3.1850615666574806]
This study investigates how consistent different metrics are at evaluating models across data of different prevalence. I find that evaluation metrics that are less influenced by prevalence offer more consistent evaluation of individual models and more consistent ranking of a set of models.
arXiv Detail & Related papers (2024-08-19T17:52:38Z)
Beyond Benchmarks: Evaluating Embedding Model Similarity for Retrieval Augmented Generation Systems [0.9976432338233169]
We evaluate the similarity of embedding models within the context of RAG systems. We compare different families of embedding models, including proprietary ones, across five datasets. We identify possible open-source alternatives to proprietary models, with Mistral exhibiting the highest similarity to OpenAI models.
arXiv Detail & Related papers (2024-07-11T08:24:16Z)
Rethinking Few-shot 3D Point Cloud Semantic Segmentation [62.80639841429669]
This paper revisits few-shot 3D point cloud semantic segmentation (FS-PCS) We focus on two significant issues in the state-of-the-art: foreground leakage and sparse point distribution. To address these issues, we introduce a standardized FS-PCS setting, upon which a new benchmark is built.
arXiv Detail & Related papers (2024-03-01T15:14:47Z)
Leveraging Uncertainty Estimates To Improve Classifier Performance [4.4951754159063295]
Binary classification involves predicting the label of an instance based on whether the model score for the positive class exceeds a threshold chosen based on the application requirements. However, model scores are often not aligned with the true positivity rate. This is especially true when the training involves a differential sampling across classes or there is distributional drift between train and test settings.
arXiv Detail & Related papers (2023-11-20T12:40:25Z)
ProTeCt: Prompt Tuning for Taxonomic Open Set Classification [59.59442518849203]
Few-shot adaptation methods do not fare well in the taxonomic open set (TOS) setting. We propose a prompt tuning technique that calibrates the hierarchical consistency of model predictions. A new Prompt Tuning for Hierarchical Consistency (ProTeCt) technique is then proposed to calibrate classification across label set granularities.
arXiv Detail & Related papers (2023-06-04T02:55:25Z)
Parametric Classification for Generalized Category Discovery: A Baseline Study [70.73212959385387]
Generalized Category Discovery (GCD) aims to discover novel categories in unlabelled datasets using knowledge learned from labelled samples. We investigate the failure of parametric classifiers, verify the effectiveness of previous design choices when high-quality supervision is available, and identify unreliable pseudo-labels as a key problem. We propose a simple yet effective parametric classification method that benefits from entropy regularisation, achieves state-of-the-art performance on multiple GCD benchmarks and shows strong robustness to unknown class numbers.
arXiv Detail & Related papers (2022-11-21T18:47:11Z)
Decision Making for Hierarchical Multi-label Classification with Multidimensional Local Precision Rate [4.812468844362369]
We introduce a new statistic called the multidimensional local precision rate (mLPR) for each object in each class. We show that classification decisions made by simply sorting objects across classes in descending order of their mLPRs can, in theory, ensure the class hierarchy. In response, we introduce HierRank, a new algorithm that maximizes an empirical version of CATCH using estimated mLPRs while respecting the hierarchy.
arXiv Detail & Related papers (2022-05-16T17:43:35Z)
Rank4Class: A Ranking Formulation for Multiclass Classification [26.47229268790206]
Multiclass classification (MCC) is a fundamental machine learning problem. We show that it is easy to boost MCC performance with a novel formulation through the lens of ranking.
arXiv Detail & Related papers (2021-12-17T19:22:37Z)
No Fear of Heterogeneity: Classifier Calibration for Federated Learning with Non-IID Data [78.69828864672978]
A central challenge in training classification models in the real-world federated system is learning with non-IID data. We propose a novel and simple algorithm called Virtual Representations (CCVR), which adjusts the classifier using virtual representations sampled from an approximated ssian mixture model. Experimental results demonstrate that CCVR state-of-the-art performance on popular federated learning benchmarks including CIFAR-10, CIFAR-100, and CINIC-10.
arXiv Detail & Related papers (2021-06-09T12:02:29Z)
Active Learning++: Incorporating Annotator's Rationale using Local Model Explanation [84.10721065676913]
Annotators can provide their rationale for choosing a label by ranking input features based on their importance for a given query. Instead of weighing all committee models equally to select the next instance, we assign higher weight to the committee model with higher agreement with the annotator's ranking. This approach is applicable to any kind of ML model using model-agnostic techniques to generate local explanation such as LIME.
arXiv Detail & Related papers (2020-09-06T08:07:33Z)
Fine-Grained Visual Classification with Efficient End-to-end Localization [49.9887676289364]
We present an efficient localization module that can be fused with a classification network in an end-to-end setup. We evaluate the new model on the three benchmark datasets CUB200-2011, Stanford Cars and FGVC-Aircraft.
arXiv Detail & Related papers (2020-05-11T14:07:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.