Related papers: Active Preference Inference using Language Models and Probabilistic Reasoning

Active Preference Inference using Language Models and Probabilistic Reasoning

URL: http://arxiv.org/abs/2312.12009v2
Date: Wed, 26 Jun 2024 15:00:52 GMT
Title: Active Preference Inference using Language Models and Probabilistic Reasoning
Authors: Wasu Top Piriyakulkij, Volodymyr Kuleshov, Kevin Ellis,
Abstract summary: We introduce an inference-time algorithm that helps large language models infer user preferences. Our algorithm uses a probabilistic model whose conditional distributions are defined by prompting an LLM. Results in a simplified interactive web shopping setting with real product items show that an LLM equipped with our entropy reduction algorithm outperforms baselines.
Score: 13.523369679010685
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Actively inferring user preferences, for example by asking good questions, is important for any human-facing decision-making system. Active inference allows such systems to adapt and personalize themselves to nuanced individual preferences. To enable this ability for instruction-tuned large language models (LLMs), one may prompt them to ask users questions to infer their preferences, transforming the language models into more robust, interactive systems. However, out of the box, these models are not efficient at extracting preferences: the questions they generate are not informative, requiring a high number of user interactions and impeding the usability of the downstream system. In this work, we introduce an inference-time algorithm that helps LLMs quickly infer preferences by using more informative questions. Our algorithm uses a probabilistic model whose conditional distributions are defined by prompting an LLM, and returns questions that optimize expected entropy and expected model change. Results in a simplified interactive web shopping setting with real product items show that an LLM equipped with our entropy reduction algorithm outperforms baselines with the same underlying LLM on task performance while using fewer user interactions.

Related papers

Personas within Parameters: Fine-Tuning Small Language Models with Low-Rank Adapters to Mimic User Behaviors [1.8352113484137629]
A long-standing challenge in developing accurate recommendation models is simulating user behavior, mainly due to the complex nature of user interactions.<n>We propose an approach to extracting robust user representations using a frozen Large Language Models (LLMs) and simulating cost-effective, resource-efficient user agents powered by fine-tuned Small Language Models (SLMs)<n>Our experiments provide compelling empirical evidence of the efficacy of our methods, demonstrating that user agents developed using our approach have the potential to bridge the gap between offline metrics and real-world performance of recommender systems.
arXiv Detail & Related papers (2025-08-18T22:14:57Z)
Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models? [65.18157595903124]
This work investigates iterative approximate evaluation for arbitrary prompts.<n>It introduces Model Predictive Prompt Selection (MoPPS), a Bayesian risk-predictive framework.<n>MoPPS reliably predicts prompt difficulty and accelerates training with significantly reduced rollouts.
arXiv Detail & Related papers (2025-07-07T03:20:52Z)
Few-shot Steerable Alignment: Adapting Rewards and LLM Policies with Neural Processes [50.544186914115045]
Large language models (LLMs) are increasingly embedded in everyday applications. Ensuring their alignment with the diverse preferences of individual users has become a critical challenge. We present a novel framework for few-shot steerable alignment.
arXiv Detail & Related papers (2024-12-18T16:14:59Z)
MAPLE: A Framework for Active Preference Learning Guided by Large Language Models [9.37268652939886]
We introduce MAPLE, a framework for large language model-guided Bayesian active preference learning. Our results demonstrate that MAPLE accelerates the learning process and effectively improves humans' ability to answer queries.
arXiv Detail & Related papers (2024-12-10T05:55:14Z)
LLM-assisted Explicit and Implicit Multi-interest Learning Framework for Sequential Recommendation [50.98046887582194]
We propose an explicit and implicit multi-interest learning framework to model user interests on two levels: behavior and semantics. The proposed EIMF framework effectively and efficiently combines small models with LLM to improve the accuracy of multi-interest modeling.
arXiv Detail & Related papers (2024-11-14T13:00:23Z)
QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning [58.767866109043055]
We introduce Query-dependent Prompt Optimization (QPO), which iteratively fine-tune a small pretrained language model to generate optimal prompts tailored to the input queries. We derive insights from offline prompting demonstration data, which already exists in large quantities as a by-product of benchmarking diverse prompts on open-sourced tasks. Experiments on various LLM scales and diverse NLP and math tasks demonstrate the efficacy and cost-efficiency of our method in both zero-shot and few-shot scenarios.
arXiv Detail & Related papers (2024-08-20T03:06:48Z)
Verbalized Machine Learning: Revisiting Machine Learning with Language Models [63.10391314749408]
We introduce the framework of verbalized machine learning (VML) VML constrains the parameter space to be human-interpretable natural language. We empirically verify the effectiveness of VML, and hope that VML can serve as a stepping stone to stronger interpretability.
arXiv Detail & Related papers (2024-06-06T17:59:56Z)
Aligning Language Models with Demonstrated Feedback [58.834937450242975]
Demonstration ITerated Task Optimization (DITTO) directly aligns language model outputs to a user's demonstrated behaviors. We evaluate DITTO's ability to learn fine-grained style and task alignment across domains such as news articles, emails, and blog posts.
arXiv Detail & Related papers (2024-06-02T23:13:56Z)
Learning to Clarify: Multi-turn Conversations with Action-Based Contrastive Self-Training [33.57497419019826]
Action-Based Contrastive Self-Training enables data-efficient dialogue policy learning in multi-turn conversation modeling.<n>We demonstrate ACT's efficacy under in data-efficient tuning scenarios, even when there is no action label available.<n>We also propose evaluating LLMs' ability to function as conversational agents by examining whether they can implicitly recognize and reason about ambiguity in conversation.
arXiv Detail & Related papers (2024-05-31T22:44:48Z)
Bayesian Preference Elicitation with Language Models [82.58230273253939]
We introduce OPEN, a framework that uses BOED to guide the choice of informative questions and an LM to extract features. In user studies, we find that OPEN outperforms existing LM- and BOED-based methods for preference elicitation.
arXiv Detail & Related papers (2024-03-08T18:57:52Z)
Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts [95.09994361995389]
Relative Preference Optimization (RPO) is designed to discern between more and less preferred responses derived from both identical and related prompts. RPO has demonstrated a superior ability to align large language models with user preferences and to improve their adaptability during the training process.
arXiv Detail & Related papers (2024-02-12T22:47:57Z)
Sample Efficient Preference Alignment in LLMs via Active Exploration [63.84454768573154]
We take advantage of the fact that one can often choose contexts at which to obtain human feedback to most efficiently identify a good policy.<n>We propose an active exploration algorithm to efficiently select the data and provide theoretical proof that it has a worst-case regret bound.<n>Our method outperforms the baselines with limited samples of human preferences on several language models and four real-world datasets.
arXiv Detail & Related papers (2023-12-01T00:54:02Z)
Adapting LLMs for Efficient, Personalized Information Retrieval: Methods and Implications [0.7832189413179361]
Large Language Models (LLMs) excel in comprehending and generating human-like text. This paper explores strategies for integrating Language Models (LLMs) with Information Retrieval (IR) systems.
arXiv Detail & Related papers (2023-11-21T02:01:01Z)
RecExplainer: Aligning Large Language Models for Explaining Recommendation Models [50.74181089742969]
Large language models (LLMs) have demonstrated remarkable intelligence in understanding, reasoning, and instruction following. This paper presents the initial exploration of using LLMs as surrogate models to explain black-box recommender models. To facilitate an effective alignment, we introduce three methods: behavior alignment, intention alignment, and hybrid alignment.
arXiv Detail & Related papers (2023-11-18T03:05:43Z)
Integrating Summarization and Retrieval for Enhanced Personalization via Large Language Models [11.950478880423733]
Personalization is an essential factor in user experience with natural language processing (NLP) systems. With the emergence of Large Language Models (LLMs), a key question is how to leverage these models to better personalize user experiences. We propose a novel summary-augmented personalization with task-aware user summaries generated by LLMs.
arXiv Detail & Related papers (2023-10-30T23:40:41Z)
PALR: Personalization Aware LLMs for Recommendation [7.407353565043918]
PALR aims to combine user history behaviors (such as clicks, purchases, ratings, etc.) with large language models (LLMs) to generate user preferred items. Our solution outperforms state-of-the-art models on various sequential recommendation tasks.
arXiv Detail & Related papers (2023-05-12T17:21:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.