Related papers: Multi-expert Prompting Improves Reliability, Safety, and Usefulness of Large Language Models

Multi-expert Prompting Improves Reliability, Safety, and Usefulness of Large Language Models

URL: http://arxiv.org/abs/2411.00492v1
Date: Fri, 01 Nov 2024 10:06:52 GMT
Title: Multi-expert Prompting Improves Reliability, Safety, and Usefulness of Large Language Models
Authors: Do Xuan Long, Duong Ngoc Yen, Anh Tuan Luu, Kenji Kawaguchi, Min-Yen Kan, Nancy F. Chen,
Abstract summary: We present Multi-expert Prompting, a novel enhancement of ExpertPrompting to improve the large language model (LLM) generation. Specifically, it guides an LLM to fulfill an input instruction by simulating multiple experts, aggregating their responses, and selecting the best among individual and aggregated responses. Our evaluations demonstrate that Multi-expert Prompting significantly outperforms ExpertPrompting and comparable baselines in enhancing the truthfulness, factuality, informativeness, and usefulness of responses while reducing toxicity and hurtfulness.
Score: 75.44218111729442
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present Multi-expert Prompting, a novel enhancement of ExpertPrompting (Xu et al., 2023), designed to improve the large language model (LLM) generation. Specifically, it guides an LLM to fulfill an input instruction by simulating multiple experts, aggregating their responses, and selecting the best among individual and aggregated responses. This process is performed in a single chain of thoughts through our seven carefully designed subtasks derived from the Nominal Group Technique (Ven and Delbecq, 1974), a well-established decision-making framework. Our evaluations demonstrate that Multi-expert Prompting significantly outperforms ExpertPrompting and comparable baselines in enhancing the truthfulness, factuality, informativeness, and usefulness of responses while reducing toxicity and hurtfulness. It further achieves state-of-the-art truthfulness by outperforming the best baseline by 8.69% with ChatGPT. Multi-expert Prompting is efficient, explainable, and highly adaptable to diverse scenarios, eliminating the need for manual prompt construction.

Related papers

The Future of MLLM Prompting is Adaptive: A Comprehensive Experimental Evaluation of Prompt Engineering Methods for Robust Multimodal Performance [0.393259574660092]
Multimodal Large Language Models (MLLMs) are set to transform how machines process and generate human-like responses. We present a comprehensive experimental evaluation of seven prompt engineering methods applied to 13 open-source MLLMs over 24 tasks.
arXiv Detail & Related papers (2025-04-14T12:31:39Z)
MARS: A Multi-Agent Framework Incorporating Socratic Guidance for Automated Prompt Optimization [30.748085697067154]
We propose a Multi-Agent framework incorporating Socratic guidance (MARS) MARS comprises seven agents, each with distinct functionalities, which autonomously use the Planner to devise an optimization path. We conduct extensive experiments on various datasets to validate the effectiveness of our method.
arXiv Detail & Related papers (2025-03-21T06:19:55Z)
MMLU-ProX: A Multilingual Benchmark for Advanced Large Language Model Evaluation [60.52580061637301]
MMLU-ProX is a comprehensive benchmark covering 13 typologically diverse languages with approximately 11,829 questions per language. We evaluate 25 state-of-the-art large language models (LLMs) using 5-shot chain-of-thought (CoT) and zero-shot prompting strategies, analyzing their performance across linguistic and cultural boundaries. Our experiments reveal consistent performance degradation from high-resource languages to lower-resource ones, with the best models achieving over 70% accuracy on English but dropping to around 40% for languages like Swahili.
arXiv Detail & Related papers (2025-03-13T15:59:20Z)
X-Reflect: Cross-Reflection Prompting for Multimodal Recommendation [47.96737683498274]
Large Language Models (LLMs) and Large Multimodal Models (LMMs) have been shown to enhance the effectiveness of enriching item descriptions. This paper introduces a novel framework, Cross-Reflection Prompting, termed X-Reflect, to address limitations by prompting LMMs to explicitly identify and reconcile supportive and conflicting information between text and images.
arXiv Detail & Related papers (2024-08-27T16:10:21Z)
QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning [58.767866109043055]
We introduce Query-dependent Prompt Optimization (QPO), which iteratively fine-tune a small pretrained language model to generate optimal prompts tailored to the input queries. We derive insights from offline prompting demonstration data, which already exists in large quantities as a by-product of benchmarking diverse prompts on open-sourced tasks. Experiments on various LLM scales and diverse NLP and math tasks demonstrate the efficacy and cost-efficiency of our method in both zero-shot and few-shot scenarios.
arXiv Detail & Related papers (2024-08-20T03:06:48Z)
GANPrompt: Enhancing Robustness in LLM-Based Recommendations with GAN-Enhanced Diversity Prompts [15.920623515602038]
This paper proposes GANPrompt, a multi-dimensional large language model prompt diversity framework based on Generative Adversarial Networks (GANs) GANPrompt first trains a generator capable of producing diverse prompts by analysing multidimensional user behavioural data. These diverse prompts are then used to train the LLM to improve its performance in the face of unseen prompts.
arXiv Detail & Related papers (2024-08-19T03:13:20Z)
On the Worst Prompt Performance of Large Language Models [93.13542053835542]
Performance of large language models (LLMs) is acutely sensitive to the phrasing of prompts. We introduce RobustAlpacaEval, a new benchmark that consists of semantically equivalent case-level queries. Experiments on RobustAlpacaEval with ChatGPT and six open-source LLMs from the Llama, Mistral, and Gemma families uncover substantial variability in model performance.
arXiv Detail & Related papers (2024-06-08T13:40:38Z)
POEM: Interactive Prompt Optimization for Enhancing Multimodal Reasoning of Large Language Models [28.072184039405784]
We present POEM, a visual analytics system to facilitate efficient prompt engineering for large language models (LLMs) The system enables users to explore the interaction patterns across modalities at varying levels of detail for a comprehensive understanding of the multimodal knowledge elicited by various prompts.
arXiv Detail & Related papers (2024-06-06T08:21:30Z)
PromptAgent: Strategic Planning with Language Models Enables Expert-level Prompt Optimization [60.00631098364391]
PromptAgent is an optimization method that crafts expert-level prompts equivalent in quality to those handcrafted by experts. Inspired by human-like trial-and-error exploration, PromptAgent induces precise expert-level insights and in-depth instructions. We apply PromptAgent to 12 tasks spanning three practical domains.
arXiv Detail & Related papers (2023-10-25T07:47:01Z)
PREFER: Prompt Ensemble Learning via Feedback-Reflect-Refine [24.888093229577965]
We propose a simple, universal, and automatic method named PREFER to address the stated limitations. Our PREFER achieves state-of-the-art performance in multiple types of tasks by a significant margin.
arXiv Detail & Related papers (2023-08-23T09:46:37Z)
ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate [57.71597869337909]
We build a multi-agent referee team called ChatEval to autonomously discuss and evaluate the quality of generated responses from different models. Our analysis shows that ChatEval transcends mere textual scoring, offering a human-mimicking evaluation process for reliable assessments.
arXiv Detail & Related papers (2023-08-14T15:13:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.