Related papers: Paraphrase and Aggregate with Large Language Models for Minimizing Intent Classification Errors

Paraphrase and Aggregate with Large Language Models for Minimizing Intent Classification Errors

URL: http://arxiv.org/abs/2406.17163v1
Date: Mon, 24 Jun 2024 22:30:26 GMT
Title: Paraphrase and Aggregate with Large Language Models for Minimizing Intent Classification Errors
Authors: Vikas Yadav, Zheng Tang, Vijay Srinivasan,
Abstract summary: We show that large language models (LLM) can achieve high performance on large multi-class classification tasks but still make classification errors and worse, generate out-of-vocabulary class labels. We introduce Paraphrase and AGgregate (PAG)-LLM approach wherein an LLM generates multiple paraphrases of the input query (parallel queries) We show that PAG-LLM is especially effective for hard examples where LLM is uncertain, and reduces the critical misclassification and hallucinated label generation errors.
Score: 19.601600598570215
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Large language models (LLM) have achieved remarkable success in natural language generation but lesser focus has been given to their applicability in decision making tasks such as classification. We show that LLMs like LLaMa can achieve high performance on large multi-class classification tasks but still make classification errors and worse, generate out-of-vocabulary class labels. To address these critical issues, we introduce Paraphrase and AGgregate (PAG)-LLM approach wherein an LLM generates multiple paraphrases of the input query (parallel queries), performs multi-class classification for the original query and each paraphrase, and at the end aggregate all the classification labels based on their confidence scores. We evaluate PAG-LLM on two large multi-class classication datasets: CLINC, and Banking and show 22.7% and 15.1% error reduction. We show that PAG-LLM is especially effective for hard examples where LLM is uncertain, and reduces the critical misclassification and hallucinated label generation errors

Related papers

On the Effectiveness of LLM-as-a-judge for Code Generation and Summarization [54.965787768076254]
Large Language Models have been recently exploited as judges for complex natural language processing tasks, such as Q&A.<n>We study the effectiveness of LLMs-as-a-judge for two code-related tasks, namely code generation and code summarization.
arXiv Detail & Related papers (2025-07-22T13:40:26Z)
Evaluating how LLM annotations represent diverse views on contentious topics [3.405231040967506]
We show how generative large language models (LLMs) represent diverse viewpoints on contentious labeling tasks. Our findings suggest that when using LLMs to annotate data, under-representing the views of particular groups is not a substantial concern.
arXiv Detail & Related papers (2025-03-29T22:53:15Z)
Idiosyncrasies in Large Language Models [54.26923012617675]
We unveil and study idiosyncrasies in Large Language Models (LLMs) We find that fine-tuning existing text embedding models on LLM-generated texts yields excellent classification accuracy. We leverage LLM as judges to generate detailed, open-ended descriptions of each model's idiosyncrasies.
arXiv Detail & Related papers (2025-02-17T18:59:02Z)
Text Classification in the LLM Era - Where do we stand? [2.7624021966289605]
Large Language Models revolutionized NLP and showed dramatic performance improvements across several tasks. We investigated the role of such language models in text classification and how they compare with other approaches.
arXiv Detail & Related papers (2025-02-17T14:25:54Z)
Data Quality Enhancement on the Basis of Diversity with Large Language Models for Text Classification: Uncovered, Difficult, and Noisy [5.225010551503337]
This paper proposes a data quality enhancement (DQE) method for text classification based on large language models (LLMs) Experimental results demonstrate that our method effectively enhances the performance of LLMs in text classification tasks. Our method has achieved state-of-the-art performance in several open-source classification tasks.
arXiv Detail & Related papers (2024-12-09T15:28:39Z)
Are LLMs Better than Reported? Detecting Label Errors and Mitigating Their Effect on Model Performance [21.926934384262594]
Large language models (LLMs) offer new opportunities to enhance the annotation process. We compare expert, crowd-sourced, and our LLM-based annotations in terms of agreement, label quality, and efficiency. Our findings reveal a substantial number of label errors, which, when corrected, induce a significant upward shift in reported model performance.
arXiv Detail & Related papers (2024-10-24T16:27:03Z)
SkillAggregation: Reference-free LLM-Dependent Aggregation [14.46141987797362]
Large Language Models (LLMs) are increasingly used to assess NLP tasks. Recent work suggests using multiple LLMs as judges yields improved performance. This work focuses on aggregating predictions from multiple systems where no reference labels are available.
arXiv Detail & Related papers (2024-10-14T07:13:47Z)
Zero-to-Strong Generalization: Eliciting Strong Capabilities of Large Language Models Iteratively without Gold Labels [75.77877889764073]
Large Language Models (LLMs) have demonstrated remarkable performance through supervised fine-tuning or in-context learning using gold labels. This study explores whether solely utilizing unlabeled data can elicit strong model capabilities. We propose a new paradigm termed zero-to-strong generalization.
arXiv Detail & Related papers (2024-09-19T02:59:44Z)
African or European Swallow? Benchmarking Large Vision-Language Models for Fine-Grained Object Classification [53.89380284760555]
textttFOCI (textbfFine-grained textbfObject textbfClasstextbfIfication) is a difficult multiple-choice benchmark for fine-grained object classification. textttFOCIxspace complements five popular classification datasets with four domain-specific subsets from ImageNet-21k.
arXiv Detail & Related papers (2024-06-20T16:59:39Z)
RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition [78.97487780589574]
Multimodal Large Language Models (MLLMs) excel at classifying fine-grained categories. This paper introduces a Retrieving And Ranking augmented method for MLLMs. Our proposed approach not only addresses the inherent limitations in fine-grained recognition but also preserves the model's comprehensive knowledge base.
arXiv Detail & Related papers (2024-03-20T17:59:55Z)
Generalized Category Discovery with Large Language Models in the Loop [10.440661581492723]
We propose Loop, an end-to-end active-learning framework that introduces Large Language Models into the training loop. We show that Loop outperforms SOTA models by a large margin and generates accurate category names for the discovered clusters.
arXiv Detail & Related papers (2023-12-18T02:55:14Z)
CLAMP: Contrastive LAnguage Model Prompt-tuning [89.96914454453791]
We show that large language models can achieve good image classification performance when adapted this way. Our approach beats state-of-the-art mLLMs by 13% and slightly outperforms contrastive learning with a custom text model.
arXiv Detail & Related papers (2023-12-04T05:13:59Z)
Label Supervised LLaMA Finetuning [13.939718306233617]
In this paper, we introduce a label-supervised adaptation for Large Language Models (LLMs) We extract latent representations from the final LLaMA layer and project them into the label space to compute the cross-entropy loss. Remarkably, without intricate prompt engineering or external knowledge, LS-LLaMA substantially outperforms LLMs ten times its size in scale.
arXiv Detail & Related papers (2023-10-02T13:53:03Z)
Can Large Language Models Infer Causation from Correlation? [104.96351414570239]
We test the pure causal inference skills of large language models (LLMs) We formulate a novel task Corr2Cause, which takes a set of correlational statements and determines the causal relationship between the variables. We show that these models achieve almost close to random performance on the task.
arXiv Detail & Related papers (2023-06-09T12:09:15Z)
LLM-Pruner: On the Structural Pruning of Large Language Models [65.02607075556742]
Large language models (LLMs) have shown remarkable capabilities in language understanding and generation. We tackle the compression of LLMs within the bound of two constraints: being task-agnostic and minimizing the reliance on the original training dataset. Our method, named LLM-Pruner, adopts structural pruning that selectively removes non-critical coupled structures.
arXiv Detail & Related papers (2023-05-19T12:10:53Z)
A Multi-level Supervised Contrastive Learning Framework for Low-Resource Natural Language Inference [54.678516076366506]
Natural Language Inference (NLI) is a growingly essential task in natural language understanding. Here we propose a multi-level supervised contrastive learning framework named MultiSCL for low-resource natural language inference.
arXiv Detail & Related papers (2022-05-31T05:54:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.