Related papers: Prompt Optimization with Human Feedback

Prompt Optimization with Human Feedback

URL: http://arxiv.org/abs/2405.17346v1
Date: Mon, 27 May 2024 16:49:29 GMT
Title: Prompt Optimization with Human Feedback
Authors: Xiaoqiang Lin, Zhongxiang Dai, Arun Verma, See-Kiong Ng, Patrick Jaillet, Bryan Kian Hsiang Low,
Abstract summary: We study the problem of prompt optimization with human feedback (POHF) We introduce our algorithm named automated POHF (APOHF) The results demonstrate that our APOHF can efficiently find a good prompt using a small number of preference feedback instances.
Score: 69.95991134172282
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) have demonstrated remarkable performances in various tasks. However, the performance of LLMs heavily depends on the input prompt, which has given rise to a number of recent works on prompt optimization. However, previous works often require the availability of a numeric score to assess the quality of every prompt. Unfortunately, when a human user interacts with a black-box LLM, attaining such a score is often infeasible and unreliable. Instead, it is usually significantly easier and more reliable to obtain preference feedback from a human user, i.e., showing the user the responses generated from a pair of prompts and asking the user which one is preferred. Therefore, in this paper, we study the problem of prompt optimization with human feedback (POHF), in which we aim to optimize the prompt for a black-box LLM using only human preference feedback. Drawing inspiration from dueling bandits, we design a theoretically principled strategy to select a pair of prompts to query for preference feedback in every iteration, and hence introduce our algorithm named automated POHF (APOHF). We apply our APOHF algorithm to various tasks, including optimizing user instructions, prompt optimization for text-to-image generative models, and response optimization with human feedback (i.e., further refining the response using a variant of our APOHF). The results demonstrate that our APOHF can efficiently find a good prompt using a small number of preference feedback instances. Our code can be found at \url{https://github.com/xqlin98/APOHF}.

Related papers

GRP: Goal-Reversed Prompting for Zero-Shot Evaluation with LLMs [14.906150451947443]
Using Large Language Models (LLMs) to evaluate and compare two answers typically involves having LLM-based judges select the better answer. We propose a Goal-Reversed Prompting (GRP) approach for pairwise evaluation that shifts the original task from selecting the better answer to choosing the worse one.
arXiv Detail & Related papers (2025-03-08T09:44:24Z)
Active Learning for Direct Preference Optimization [59.84525302418018]
Direct preference optimization (DPO) is a form of reinforcement learning from human feedback. We propose an active learning framework for DPO, which can be applied to collect human feedback online or to choose the most informative subset of already collected feedback offline.
arXiv Detail & Related papers (2025-03-03T00:36:31Z)
Putting People in LLMs' Shoes: Generating Better Answers via Question Rewriter [17.736962215696366]
We introduce single-round instance-level prompt optimization, referred to as question rewriter. By enhancing the intelligibility of human questions for black-box LLMs, our question rewriter improves the quality of generated answers.
arXiv Detail & Related papers (2024-08-20T06:24:47Z)
QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning [58.767866109043055]
We introduce Query-dependent Prompt Optimization (QPO), which iteratively fine-tune a small pretrained language model to generate optimal prompts tailored to the input queries. We derive insights from offline prompting demonstration data, which already exists in large quantities as a by-product of benchmarking diverse prompts on open-sourced tasks. Experiments on various LLM scales and diverse NLP and math tasks demonstrate the efficacy and cost-efficiency of our method in both zero-shot and few-shot scenarios.
arXiv Detail & Related papers (2024-08-20T03:06:48Z)
Reinforced Prompt Personalization for Recommendation with Large Language Models [24.360796133889156]
We introduce the concept of instance-wise prompting, aiming at personalizing discrete prompts for individual users. To improve efficiency and quality, RPP personalizes prompts at the sentence level rather than searching in the vast vocabulary word-by-word.
arXiv Detail & Related papers (2024-07-24T09:24:49Z)
Efficient Universal Goal Hijacking with Semantics-guided Prompt Organization [30.56428628397079]
Universal goal hijacking is a kind of prompt injection attack that forces LLMs to return a target malicious response for arbitrary normal user prompts.<n>Previous methods achieve high attack performance while being too cumbersome and time-consuming.<n>We propose a method called POUGH that incorporates an efficient optimization algorithm and two semantics-guided prompt organization strategies.
arXiv Detail & Related papers (2024-05-23T05:31:41Z)
Reinforcement Learning from Human Feedback with Active Queries [67.27150911254155]
Current reinforcement learning approaches often require a large amount of human-labelled preference data. We propose query-efficient RLHF methods, inspired by the success of active learning. Our experiments show that ADPO, while only making about half of queries for human preference, matches the performance of the state-of-the-art DPO method.
arXiv Detail & Related papers (2024-02-14T18:58:40Z)
PRompt Optimization in Multi-Step Tasks (PROMST): Integrating Human Feedback and Heuristic-based Sampling [20.0605311279483]
We introduce PRompt Optimization in Multi-Step Tasks (PROMST) It incorporates human-designed feedback rules to automatically offer direct suggestions for improvement. It significantly outperforms both human-engineered prompts and several other prompt optimization methods across 11 representative multi-step tasks.
arXiv Detail & Related papers (2024-02-13T16:38:01Z)
Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts [95.09994361995389]
Relative Preference Optimization (RPO) is designed to discern between more and less preferred responses derived from both identical and related prompts. RPO has demonstrated a superior ability to align large language models with user preferences and to improve their adaptability during the training process.
arXiv Detail & Related papers (2024-02-12T22:47:57Z)
Contrastive Preference Learning: Learning from Human Feedback without RL [71.77024922527642]
We introduce Contrastive Preference Learning (CPL), an algorithm for learning optimal policies from preferences without learning reward functions. CPL is fully off-policy, uses only a simple contrastive objective, and can be applied to arbitrary MDPs.
arXiv Detail & Related papers (2023-10-20T16:37:56Z)
Query-Dependent Prompt Evaluation and Optimization with Offline Inverse RL [62.824464372594576]
We aim to enhance arithmetic reasoning ability of Large Language Models (LLMs) through zero-shot prompt optimization. We identify a previously overlooked objective of query dependency in such optimization. We introduce Prompt-OIRL, which harnesses offline inverse reinforcement learning to draw insights from offline prompting demonstration data.
arXiv Detail & Related papers (2023-09-13T01:12:52Z)
Discrete Prompt Optimization via Constrained Generation for Zero-shot Re-ranker [0.2580765958706853]
Large-scale language model (LLM) is utilized as a zero-shot re-ranker with excellent results. LLM is highly dependent on the prompts, the impact and the optimization of the prompts for the zero-shot re-ranker are not explored yet. We propose a novel discrete prompt optimization method, Constrained Prompt generation (Co-Prompt), with the metric estimating the optimum for re-ranking.
arXiv Detail & Related papers (2023-05-23T06:35:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.