Don't Just Demo, Teach Me the Principles: A Principle-Based Multi-Agent Prompting Strategy for Text Classification
- URL: http://arxiv.org/abs/2502.07165v1
- Date: Tue, 11 Feb 2025 01:10:13 GMT
- Title: Don't Just Demo, Teach Me the Principles: A Principle-Based Multi-Agent Prompting Strategy for Text Classification
- Authors: Peipei Wei, Dimitris Dimitriadis, Yan Xu, Mingwei Shen,
- Abstract summary: We present PRINCIPLE-BASED PROMPTING, a simple but effective multi-agent prompting strategy for text classification.
Our approach achieves substantial performance gains (1.55% - 19.37%) over zero-shot prompting on macro-F1 score.
Our multi-agent PRINCIPLE-BASED PROMPTING approach also shows on-par or better performance compared to demonstration-based few-shot prompting approaches.
- Score: 4.811763060654019
- License:
- Abstract: We present PRINCIPLE-BASED PROMPTING, a simple but effective multi-agent prompting strategy for text classification. It first asks multiple LLM agents to independently generate candidate principles based on analysis of demonstration samples with or without labels, consolidates them into final principles via a finalizer agent, and then sends them to a classifier agent to perform downstream classification tasks. Extensive experiments on binary and multi-class classification datasets with different sizes of LLMs show that our approach not only achieves substantial performance gains (1.55% - 19.37%) over zero-shot prompting on macro-F1 score but also outperforms other strong baselines (CoT and stepback prompting). Principles generated by our approach help LLMs perform better on classification tasks than human crafted principles on two private datasets. Our multi-agent PRINCIPLE-BASED PROMPTING approach also shows on-par or better performance compared to demonstration-based few-shot prompting approaches, yet with substantially lower inference costs. Ablation studies show that label information and the multi-agent cooperative LLM framework play an important role in generating high-quality principles to facilitate downstream classification tasks.
Related papers
- Improve LLM-based Automatic Essay Scoring with Linguistic Features [46.41475844992872]
This paper develops a scoring system capable of handling essays across diverse prompts.
Existing methods typically fall into two categories: supervised feature-based approaches and large language model (LLM)-based methods.
arXiv Detail & Related papers (2025-02-13T17:09:52Z) - MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs [97.94579295913606]
Multimodal Large Language Models (MLLMs) have garnered increased attention from both industry and academia.
In the development process, evaluation is critical since it provides intuitive feedback and guidance on improving models.
This work aims to offer researchers an easy grasp of how to effectively evaluate MLLMs according to different needs and to inspire better evaluation methods.
arXiv Detail & Related papers (2024-11-22T18:59:54Z) - Star-Agents: Automatic Data Optimization with LLM Agents for Instruction Tuning [71.2981957820888]
We propose a novel Star-Agents framework, which automates the enhancement of data quality across datasets.
The framework initially generates diverse instruction data with multiple LLM agents through a bespoke sampling method.
The generated data undergo a rigorous evaluation using a dual-model method that assesses both difficulty and quality.
arXiv Detail & Related papers (2024-11-21T02:30:53Z) - SkillAggregation: Reference-free LLM-Dependent Aggregation [14.46141987797362]
Large Language Models (LLMs) are increasingly used to assess NLP tasks.
Recent work suggests using multiple LLMs as judges yields improved performance.
This work focuses on aggregating predictions from multiple systems where no reference labels are available.
arXiv Detail & Related papers (2024-10-14T07:13:47Z) - SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts.
We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM.
We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z) - LLMEmbed: Rethinking Lightweight LLM's Genuine Function in Text Classification [13.319594321038926]
We propose a simple and effective transfer learning strategy, namely LLMEmbed, to address this classical but challenging task.
We perform extensive experiments on publicly available datasets, and the results show that LLMEmbed achieves strong performance while enjoys low training overhead.
arXiv Detail & Related papers (2024-06-06T03:46:59Z) - Automated Commit Message Generation with Large Language Models: An Empirical Study and Beyond [24.151927600694066]
Commit Message Generation (CMG) approaches aim to automatically generate commit messages based on given code diffs.
This paper conducts the first comprehensive experiment to investigate how far we have been in applying Large Language Models (LLMs) to generate high-quality commit messages.
arXiv Detail & Related papers (2024-04-23T08:24:43Z) - Routing to the Expert: Efficient Reward-guided Ensemble of Large
Language Models [69.51130760097818]
We propose Zooter, a reward-guided routing method distilling rewards on training queries to train a routing function.
We evaluate Zooter on a comprehensive benchmark collection with 26 subsets on different domains and tasks.
arXiv Detail & Related papers (2023-11-15T04:40:43Z) - Towards LLM-based Fact Verification on News Claims with a Hierarchical
Step-by-Step Prompting Method [9.099277246096861]
In this paper, we examine large pre-trained language models (LLMs) with in-context learning (ICL) for news claim verification.
We introduce a Hierarchical Step-by-Step (HiSS) prompting method which directs LLMs to separate a claim into several subclaims and then verify each of them via multiple questions-answering steps progressively.
Experiment results on two public misinformation datasets show that HiSS prompting outperforms state-of-the-art fully-supervised approach and strong few-shot ICL-enabled baselines.
arXiv Detail & Related papers (2023-09-30T08:33:04Z) - LLMRec: Benchmarking Large Language Models on Recommendation Task [54.48899723591296]
The application of Large Language Models (LLMs) in the recommendation domain has not been thoroughly investigated.
We benchmark several popular off-the-shelf LLMs on five recommendation tasks, including rating prediction, sequential recommendation, direct recommendation, explanation generation, and review summarization.
The benchmark results indicate that LLMs displayed only moderate proficiency in accuracy-based tasks such as sequential and direct recommendation.
arXiv Detail & Related papers (2023-08-23T16:32:54Z) - CSS-LM: A Contrastive Framework for Semi-supervised Fine-tuning of
Pre-trained Language Models [59.49705076369856]
We introduce a novel framework to improve the fine-tuning phase of pre-trained language models (PLMs)
We retrieve positive and negative instances from large-scale unlabeled corpora according to their domain-level and class-level semantic relatedness to a task.
We then perform contrastive semi-supervised learning on both the retrieved unlabeled and original labeled instances to help PLMs capture crucial task-related semantic features.
arXiv Detail & Related papers (2021-02-07T09:27:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.