Related papers: Performance of diverse evaluation metrics in NLP-based assessment and text generation of consumer complaints

Performance of diverse evaluation metrics in NLP-based assessment and text generation of consumer complaints

URL: http://arxiv.org/abs/2506.21623v1
Date: Mon, 23 Jun 2025 17:26:38 GMT
Title: Performance of diverse evaluation metrics in NLP-based assessment and text generation of consumer complaints
Authors: Peiheng Gao, Chen Yang, Ning Sun, Ričardas Zitikis,
Abstract summary: This study addresses issues by incorporating human-experience-trained algorithms that effectively recognize subtle semantic differences crucial for assessing consumer relief eligibility.<n>We propose integrating synthetic data generation methods that utilize expert evaluations of generative adversarial networks and are refined through expert annotations.
Score: 3.447707016041768
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Machine learning (ML) has significantly advanced text classification by enabling automated understanding and categorization of complex, unstructured textual data. However, accurately capturing nuanced linguistic patterns and contextual variations inherent in natural language, particularly within consumer complaints, remains a challenge. This study addresses these issues by incorporating human-experience-trained algorithms that effectively recognize subtle semantic differences crucial for assessing consumer relief eligibility. Furthermore, we propose integrating synthetic data generation methods that utilize expert evaluations of generative adversarial networks and are refined through expert annotations. By combining expert-trained classifiers with high-quality synthetic data, our research seeks to significantly enhance machine learning classifier performance, reduce dataset acquisition costs, and improve overall evaluation metrics and robustness in text classification tasks.

Related papers

Monocle: Hybrid Local-Global In-Context Evaluation for Long-Text Generation with Uncertainty-Based Active Learning [63.531262595858]
Divide-and-conquer approach breaks comprehensive evaluation task into localized scoring tasks, followed by a final global assessment.<n>We introduce a hybrid in-context learning approach that leverages human annotations to enhance the performance of both local and global evaluations.<n>Finally, we develop an uncertainty-based active learning algorithm that efficiently selects data samples for human annotation.
arXiv Detail & Related papers (2025-05-26T16:39:41Z)
Detecting Spelling and Grammatical Anomalies in Russian Poetry Texts [0.0]
The quality of natural language texts in fine-tuning datasets plays a critical role in the performance of generative models.<n>We propose the use of automated linguistic anomaly detection to identify and filter out low-quality texts from training datasets for creative models.<n>Our work aims to empower the community with tools and insights to improve the quality of training datasets for generative models in creative domains.
arXiv Detail & Related papers (2025-05-07T15:27:59Z)
ReLearn: Unlearning via Learning for Large Language Models [64.2802606302194]
We propose ReLearn, a data augmentation and fine-tuning pipeline for effective unlearning.<n>This framework introduces Knowledge Forgetting Rate (KFR) and Knowledge Retention Rate (KRR) to measure knowledge-level preservation.<n>Our experiments show that ReLearn successfully achieves targeted forgetting while preserving high-quality output.
arXiv Detail & Related papers (2025-02-16T16:31:00Z)
A Systematic Review of Data-to-Text NLG [2.4769539696439677]
Methods for producing high-quality text are explored, addressing the challenge of hallucinations in data-to-text generation. Despite advancements in text quality, the review emphasizes the importance of research in low-resourced languages.
arXiv Detail & Related papers (2024-02-13T14:51:45Z)
How Well Do Text Embedding Models Understand Syntax? [50.440590035493074]
The ability of text embedding models to generalize across a wide range of syntactic contexts remains under-explored. Our findings reveal that existing text embedding models have not sufficiently addressed these syntactic understanding challenges. We propose strategies to augment the generalization ability of text embedding models in diverse syntactic scenarios.
arXiv Detail & Related papers (2023-11-14T08:51:00Z)
Exploring Machine Learning and Transformer-based Approaches for Deceptive Text Classification: A Comparative Analysis [0.0]
This study presents a comparative analysis of machine learning and transformer-based approaches for deceptive text classification. We investigate the effectiveness of traditional machine learning algorithms and state-of-the-art transformer models, such as BERT, XLNET, DistilBERT, and RoBERTa. The results of this study shed light on the strengths and limitations of machine learning and transformer-based methods for deceptive text classification.
arXiv Detail & Related papers (2023-08-10T10:07:00Z)
DecompEval: Evaluating Generated Texts as Unsupervised Decomposed Question Answering [95.89707479748161]
Existing evaluation metrics for natural language generation (NLG) tasks face the challenges on generalization ability and interpretability. We propose a metric called DecompEval that formulates NLG evaluation as an instruction-style question answering task. We decompose our devised instruction-style question about the quality of generated texts into the subquestions that measure the quality of each sentence. The subquestions with their answers generated by PLMs are then recomposed as evidence to obtain the evaluation result.
arXiv Detail & Related papers (2023-07-13T16:16:51Z)
Multi-Dimensional Evaluation of Text Summarization with In-Context Learning [79.02280189976562]
In this paper, we study the efficacy of large language models as multi-dimensional evaluators using in-context learning. Our experiments show that in-context learning-based evaluators are competitive with learned evaluation frameworks for the task of text summarization. We then analyze the effects of factors such as the selection and number of in-context examples on performance.
arXiv Detail & Related papers (2023-06-01T23:27:49Z)
Paraphrase Detection: Human vs. Machine Content [3.8768839735240737]
Human-authored paraphrases exceed machine-generated ones in terms of difficulty, diversity, and similarity. Transformers emerged as the most effective method across datasets with TF-IDF excelling on semantically diverse corpora.
arXiv Detail & Related papers (2023-03-24T13:25:46Z)
TextFlint: Unified Multilingual Robustness Evaluation Toolkit for Natural Language Processing [73.16475763422446]
We propose a multilingual robustness evaluation platform for NLP tasks (TextFlint) It incorporates universal text transformation, task-specific transformation, adversarial attack, subpopulation, and their combinations to provide comprehensive robustness analysis. TextFlint generates complete analytical reports as well as targeted augmented data to address the shortcomings of the model's robustness.
arXiv Detail & Related papers (2021-03-21T17:20:38Z)
Sentence Level Human Translation Quality Estimation with Attention-based Neural Networks [0.30458514384586394]
This paper explores the use of Deep Learning methods for automatic estimation of quality of human translations. Empirical results on a large human annotated dataset show that the neural model outperforms feature-based methods significantly.
arXiv Detail & Related papers (2020-03-13T16:57:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.