FIPO: Free-form Instruction-oriented Prompt Optimization with Preference Dataset and Modular Fine-tuning Schema
- URL: http://arxiv.org/abs/2402.11811v3
- Date: Wed, 14 Aug 2024 11:47:39 GMT
- Title: FIPO: Free-form Instruction-oriented Prompt Optimization with Preference Dataset and Modular Fine-tuning Schema
- Authors: Junru Lu, Siyu An, Min Zhang, Yulan He, Di Yin, Xing Sun,
- Abstract summary: We propose Free-from Instruction-oriented Prompt Optimization (FIPO) to improve task performance of large language models (LLMs)
FIPO uses a modular APO template that dynamically integrate the naive task instruction, optional instruction responses, and optional ground truth to produce finely optimized prompts.
We validate FIPO framework across five public benchmarks and six testing models.
- Score: 36.65009632307124
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When the quality of naive prompts is carefully optimized by human experts, the task performance of large language models (LLMs) can be significantly improved. However, expert-based prompt optimizations are expensive. Herein, some works have proposed Automatic Prompt Optimization (APO), to optimize naive prompts according to task outputs of given in-box testing models, with the help of advanced LLMs (e.g., GPT-4) in an ad-hoc way. Although effective, existing schemes suffer from poor generalization ability and privacy risk. To this end, we collect the first large-scale Prompt Optimization Preference dataset (POP), fine-tune offline local LLM-based optimizers, then fairly test with various downstream models. Our method allows accurate optimization of the core task instruction part within the naive prompt in a model-agnostic manner, and thus is named Free-from Instruction-oriented Prompt Optimization (FIPO). In specific, FIPO uses a modular APO template that dynamically integrate the naive task instruction, optional instruction responses, and optional ground truth to produce finely optimized prompts. The POP dataset is meticulously constructed using advanced LLMs, undergoing rigorous cross-validation by human experts and analytical models. Leveraging insights from the data with Tulu2 models and diverse fine-tuning strategies, we validate the efficacy of FIPO framework across five public benchmarks and six testing models. Check codes and data here: https://github.com/LuJunru/FIPO_Project.
Related papers
- RosePO: Aligning LLM-based Recommenders with Human Values [38.029251417802044]
We propose a general framework -- Recommendation with smoothing personalized Preference Optimization (RosePO)
RosePO better aligns with customized human values during the post-training stage.
Evaluation on three real-world datasets demonstrates the effectiveness of our method.
arXiv Detail & Related papers (2024-10-16T12:54:34Z) - ASFT: Aligned Supervised Fine-Tuning through Absolute Likelihood [14.512464277772194]
Aligned Supervised Fine-Tuning (ASFT) is an effective approach that better aligns Large Language Models with pair-wise datasets.
ASFT mitigates the issue where the DPO loss function decreases the probability of generating human-dispreferred data.
Extensive experiments demonstrate that ASFT is an effective alignment approach, consistently outperforming existing methods.
arXiv Detail & Related papers (2024-09-14T11:39:13Z) - AIPO: Improving Training Objective for Iterative Preference Optimization [34.24211649396053]
We study iterative preference optimization with synthetic data.
We propose our training objective for iterative preference optimization, namely Agreement-aware Iterative Preference Optimization (AIPO)
arXiv Detail & Related papers (2024-09-13T14:03:49Z) - Self-Augmented Preference Optimization: Off-Policy Paradigms for Language Model Alignment [104.18002641195442]
We introduce Self-Augmented Preference Optimization (SAPO), an effective and scalable training paradigm that does not require existing paired data.
Building on the self-play concept, which autonomously generates negative responses, we further incorporate an off-policy learning pipeline to enhance data exploration and exploitation.
arXiv Detail & Related papers (2024-05-31T14:21:04Z) - Localized Zeroth-Order Prompt Optimization [54.964765668688806]
We propose a novel algorithm, namely localized zeroth-order prompt optimization (ZOPO)
ZOPO incorporates a Neural Tangent Kernel-based derived Gaussian process into standard zeroth-order optimization for an efficient search of well-performing local optima in prompt optimization.
Remarkably, ZOPO outperforms existing baselines in terms of both the optimization performance and the query efficiency.
arXiv Detail & Related papers (2024-03-05T14:18:15Z) - Unleashing the Potential of Large Language Models as Prompt Optimizers: An Analogical Analysis with Gradient-based Model Optimizers [108.72225067368592]
We propose a novel perspective to investigate the design of large language models (LLMs)-based prompts.
We identify two pivotal factors in model parameter learning: update direction and update method.
In particular, we borrow the theoretical framework and learning methods from gradient-based optimization to design improved strategies.
arXiv Detail & Related papers (2024-02-27T15:05:32Z) - PRompt Optimization in Multi-Step Tasks (PROMST): Integrating Human Feedback and Heuristic-based Sampling [20.0605311279483]
We introduce PRompt Optimization in Multi-Step Tasks (PROMST)
It incorporates human-designed feedback rules to automatically offer direct suggestions for improvement.
It significantly outperforms both human-engineered prompts and several other prompt optimization methods across 11 representative multi-step tasks.
arXiv Detail & Related papers (2024-02-13T16:38:01Z) - Query-Dependent Prompt Evaluation and Optimization with Offline Inverse
RL [62.824464372594576]
We aim to enhance arithmetic reasoning ability of Large Language Models (LLMs) through zero-shot prompt optimization.
We identify a previously overlooked objective of query dependency in such optimization.
We introduce Prompt-OIRL, which harnesses offline inverse reinforcement learning to draw insights from offline prompting demonstration data.
arXiv Detail & Related papers (2023-09-13T01:12:52Z) - Large Language Models as Optimizers [106.52386531624532]
We propose Optimization by PROmpting (OPRO), a simple and effective approach to leverage large language models (LLMs) as prompts.
In each optimization step, the LLM generates new solutions from the prompt that contains previously generated solutions with their values.
We demonstrate that the best prompts optimized by OPRO outperform human-designed prompts by up to 8% on GSM8K, and by up to 50% on Big-Bench Hard tasks.
arXiv Detail & Related papers (2023-09-07T00:07:15Z) - Robust Prompt Optimization for Large Language Models Against
Distribution Shifts [80.6757997074956]
Large Language Model (LLM) has demonstrated significant ability in various Natural Language Processing tasks.
We propose a new problem of robust prompt optimization for LLMs against distribution shifts.
This problem requires the prompt optimized over the labeled source group can simultaneously generalize to an unlabeled target group.
arXiv Detail & Related papers (2023-05-23T11:30:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.