Evoke: Evoking Critical Thinking Abilities in LLMs via Reviewer-Author
Prompt Editing
- URL: http://arxiv.org/abs/2310.13855v1
- Date: Fri, 20 Oct 2023 23:15:59 GMT
- Title: Evoke: Evoking Critical Thinking Abilities in LLMs via Reviewer-Author
Prompt Editing
- Authors: Xinyu Hu, Pengfei Tang, Simiao Zuo, Zihan Wang, Bowen Song, Qiang Lou,
Jian Jiao, Denis Charles
- Abstract summary: Large language models (LLMs) have made impressive progress in natural language processing.
We propose Evoke, an automatic prompt refinement framework.
In Evoke, there are two instances of a same LLM: one as a reviewer, it scores the current prompt; the other as an author, it edits the prompt by considering the edit history and the reviewer's feedback.
- Score: 19.241543540941283
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) have made impressive progress in natural
language processing. These models rely on proper human instructions (or
prompts) to generate suitable responses. However, the potential of LLMs are not
fully harnessed by commonly-used prompting methods: many human-in-the-loop
algorithms employ ad-hoc procedures for prompt selection; while auto prompt
generation approaches are essentially searching all possible prompts randomly
and inefficiently. We propose Evoke, an automatic prompt refinement framework.
In Evoke, there are two instances of a same LLM: one as a reviewer
(LLM-Reviewer), it scores the current prompt; the other as an author
(LLM-Author), it edits the prompt by considering the edit history and the
reviewer's feedback. Such an author-reviewer feedback loop ensures that the
prompt is refined in each iteration. We further aggregate a data selection
approach to Evoke, where only the hard samples are exposed to the LLM. The hard
samples are more important because the LLM can develop deeper understanding of
the tasks out of them, while the model may already know how to solve the easier
cases. Experimental results show that Evoke significantly outperforms existing
methods. For instance, in the challenging task of logical fallacy detection,
Evoke scores above 80, while all other baseline methods struggle to reach 20.
Related papers
- Prompt Exploration with Prompt Regression [38.847668543140315]
We propose a framework, Prompt Exploration with Prompt Regression (PEPR), to predict the effect of prompt combinations given results for individual prompt elements.
We evaluate our approach with open-source LLMs of different sizes on several different tasks.
arXiv Detail & Related papers (2024-05-17T20:30:49Z) - Efficient Prompting Methods for Large Language Models: A Survey [50.171011917404485]
Prompting has become a mainstream paradigm for adapting large language models (LLMs) to specific natural language processing tasks.
This approach brings the additional computational burden of model inference and human effort to guide and control the behavior of LLMs.
We present the basic concepts of prompting, review the advances for efficient prompting, and highlight future research directions.
arXiv Detail & Related papers (2024-04-01T12:19:08Z) - LLMAuditor: A Framework for Auditing Large Language Models Using Human-in-the-Loop [7.77005079649294]
An effective method is to probe the Large Language Models using different versions of the same question.
To operationalize this auditing method at scale, we need an approach to create those probes reliably and automatically.
We propose the LLMAuditor framework, where one uses a different LLM along with human-in-the-loop (HIL)
This approach offers verifiability and transparency, while avoiding circular reliance on the same LLM.
arXiv Detail & Related papers (2024-02-14T17:49:31Z) - LLMRefine: Pinpointing and Refining Large Language Models via Fine-Grained Actionable Feedback [65.84061725174269]
Recent large language models (LLM) are leveraging human feedback to improve their generation quality.
We propose LLMRefine, an inference time optimization method to refine LLM's output.
We conduct experiments on three text generation tasks, including machine translation, long-form question answering (QA), and topical summarization.
LLMRefine consistently outperforms all baseline approaches, achieving improvements up to 1.7 MetricX points on translation tasks, 8.1 ROUGE-L on ASQA, 2.2 ROUGE-L on topical summarization.
arXiv Detail & Related papers (2023-11-15T19:52:11Z) - Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves [57.974103113675795]
We present a method named Rephrase and Respond' (RaR) which allows Large Language Models to rephrase and expand questions posed by humans.
RaR serves as a simple yet effective prompting method for improving performance.
We show that RaR is complementary to the popular Chain-of-Thought (CoT) methods, both theoretically and empirically.
arXiv Detail & Related papers (2023-11-07T18:43:34Z) - FreshLLMs: Refreshing Large Language Models with Search Engine
Augmentation [92.43001160060376]
We study the factuality of large language models (LLMs) in the context of answering questions that test current world knowledge.
We introduce FreshQA, a novel dynamic QA benchmark encompassing a diverse range of question and answer types.
We benchmark a diverse array of both closed and open-source LLMs under a two-mode evaluation procedure that allows us to measure both correctness and hallucination.
Motivated by these results, we present FreshPrompt, a simple few-shot prompting method that substantially boosts the performance of an LLM on FreshQA.
arXiv Detail & Related papers (2023-10-05T00:04:12Z) - Revisiting Prompt Engineering via Declarative Crowdsourcing [16.624577543520093]
Large language models (LLMs) are incredibly powerful at comprehending and generating data in the form of text, but are brittle and error-prone.
We put forth a vision for declarative prompt engineering.
Preliminary case studies on sorting, entity resolution, and imputation demonstrate the promise of our approach.
arXiv Detail & Related papers (2023-08-07T18:04:12Z) - Allies: Prompting Large Language Model with Beam Search [107.38790111856761]
In this work, we propose a novel method called ALLIES.
Given an input query, ALLIES leverages LLMs to iteratively generate new queries related to the original query.
By iteratively refining and expanding the scope of the original query, ALLIES captures and utilizes hidden knowledge that may not be directly through retrieval.
arXiv Detail & Related papers (2023-05-24T06:16:44Z) - Self-Prompting Large Language Models for Zero-Shot Open-Domain QA [67.08732962244301]
Open-Domain Question Answering (ODQA) aims to answer questions without explicitly providing background documents.
This task becomes notably challenging in a zero-shot setting where no data is available to train tailored retrieval-reader models.
We propose a Self-Prompting framework to explicitly utilize the massive knowledge encoded in the parameters of Large Language Models.
arXiv Detail & Related papers (2022-12-16T18:23:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.