MAPLE: A Framework for Active Preference Learning Guided by Large Language Models
- URL: http://arxiv.org/abs/2412.07207v2
- Date: Fri, 20 Dec 2024 01:08:20 GMT
- Title: MAPLE: A Framework for Active Preference Learning Guided by Large Language Models
- Authors: Saaduddin Mahmud, Mason Nakamura, Shlomo Zilberstein,
- Abstract summary: We introduce MAPLE, a framework for large language model-guided Bayesian active preference learning.<n>Our results demonstrate that MAPLE accelerates the learning process and effectively improves humans' ability to answer queries.
- Score: 9.37268652939886
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The advent of large language models (LLMs) has sparked significant interest in using natural language for preference learning. However, existing methods often suffer from high computational burdens, taxing human supervision, and lack of interpretability. To address these issues, we introduce MAPLE, a framework for large language model-guided Bayesian active preference learning. MAPLE leverages LLMs to model the distribution over preference functions, conditioning it on both natural language feedback and conventional preference learning feedback, such as pairwise trajectory rankings. MAPLE also employs active learning to systematically reduce uncertainty in this distribution and incorporates a language-conditioned active query selection mechanism to identify informative and easy-to-answer queries, thus reducing human burden. We evaluate MAPLE's sample efficiency and preference inference quality across two benchmarks, including a real-world vehicle route planning benchmark using OpenStreetMap data. Our results demonstrate that MAPLE accelerates the learning process and effectively improves humans' ability to answer queries.
Related papers
- Distillation and Refinement of Reasoning in Small Language Models for Document Re-ranking [21.23826888841565]
We present a novel approach for training small language models for reasoning-intensive document ranking.
We use web data and a teacher LLM to automatically generate high-quality training examples with relevance explanations.
Our model ranks third on the leaderboard while using substantially fewer parameters than other approaches.
arXiv Detail & Related papers (2025-04-04T21:27:48Z) - Dancing with Critiques: Enhancing LLM Reasoning with Stepwise Natural Language Self-Critique [66.94905631175209]
We propose a novel inference-time scaling approach -- stepwise natural language self-critique (PANEL)
It employs self-generated natural language critiques as feedback to guide the step-level search process.
This approach bypasses the need for task-specific verifiers and the associated training overhead.
arXiv Detail & Related papers (2025-03-21T17:59:55Z) - Efficient or Powerful? Trade-offs Between Machine Learning and Deep Learning for Mental Illness Detection on Social Media [0.036136619420474754]
Social media platforms provide valuable insights into mental health trends by capturing user-generated discussions on conditions such as depression, anxiety, and suicidal ideation.
Machine learning (ML) and deep learning (DL) models have been increasingly applied to classify mental health conditions from textual data.
This study evaluates multiple ML models, including logistic regression, random forest, and LightGBM, alongside deep learning architectures such as ALBERT and Gated Recurrent Units (GRUs)
Our findings indicate that ML and DL models achieve comparable classification performance on medium-sized datasets.
arXiv Detail & Related papers (2025-03-03T00:51:41Z) - Self-training Large Language Models through Knowledge Detection [26.831873737733737]
Large language models (LLMs) often necessitate extensive labeled datasets and training compute to achieve impressive performance across downstream tasks.
This paper explores a self-training paradigm, where the LLM autonomously curates its own labels and selectively trains on unknown data samples.
Empirical evaluations demonstrate significant improvements in reducing hallucination in generation across multiple subjects.
arXiv Detail & Related papers (2024-06-17T07:25:09Z) - Optimized Feature Generation for Tabular Data via LLMs with Decision Tree Reasoning [53.241569810013836]
We propose a novel framework that utilizes large language models (LLMs) to identify effective feature generation rules.
We use decision trees to convey this reasoning information, as they can be easily represented in natural language.
OCTree consistently enhances the performance of various prediction models across diverse benchmarks.
arXiv Detail & Related papers (2024-06-12T08:31:34Z) - Evaluating Generative Language Models in Information Extraction as Subjective Question Correction [49.729908337372436]
We propose a new evaluation method, SQC-Score.
Inspired by the principles in subjective question correction, we propose a new evaluation method, SQC-Score.
Results on three information extraction tasks show that SQC-Score is more preferred by human annotators than the baseline metrics.
arXiv Detail & Related papers (2024-04-04T15:36:53Z) - LLMs in the Loop: Leveraging Large Language Model Annotations for Active Learning in Low-Resource Languages [1.149936119867417]
Low-resource languages face significant barriers in AI development due to limited linguistic resources and expertise for data labeling.
We propose leveraging the potential of LLMs in the active learning loop for data annotation.
Empirical evaluations, notably employing GPT-4-Turbo, demonstrate near-state-of-the-art performance with significantly reduced data requirements.
arXiv Detail & Related papers (2024-04-02T19:34:22Z) - Bayesian Preference Elicitation with Language Models [82.58230273253939]
We introduce OPEN, a framework that uses BOED to guide the choice of informative questions and an LM to extract features.
In user studies, we find that OPEN outperforms existing LM- and BOED-based methods for preference elicitation.
arXiv Detail & Related papers (2024-03-08T18:57:52Z) - Active Preference Inference using Language Models and Probabilistic Reasoning [13.523369679010685]
We introduce an inference-time algorithm that helps large language models infer user preferences.
Our algorithm uses a probabilistic model whose conditional distributions are defined by prompting an LLM.
Results in a simplified interactive web shopping setting with real product items show that an LLM equipped with our entropy reduction algorithm outperforms baselines.
arXiv Detail & Related papers (2023-12-19T09:58:54Z) - Sample Efficient Preference Alignment in LLMs via Active Exploration [63.84454768573154]
We take advantage of the fact that one can often choose contexts at which to obtain human feedback to most efficiently identify a good policy.
We propose an active exploration algorithm to efficiently select the data and provide theoretical proof that it has a worst-case regret bound.
Our method outperforms the baselines with limited samples of human preferences on several language models and four real-world datasets.
arXiv Detail & Related papers (2023-12-01T00:54:02Z) - Large Language Models Are Latent Variable Models: Explaining and Finding
Good Demonstrations for In-Context Learning [104.58874584354787]
In recent years, pre-trained large language models (LLMs) have demonstrated remarkable efficiency in achieving an inference-time few-shot learning capability known as in-context learning.
This study aims to examine the in-context learning phenomenon through a Bayesian lens, viewing real-world LLMs as latent variable models.
arXiv Detail & Related papers (2023-01-27T18:59:01Z) - Offline RL for Natural Language Generation with Implicit Language Q
Learning [87.76695816348027]
Large language models can be inconsistent when it comes to completing user specified tasks.
We propose a novel RL method, that combines both the flexible utility framework of RL with the ability of supervised learning.
In addition to empirically validating ILQL, we present a detailed empirical analysis situations where offline RL can be useful in natural language generation settings.
arXiv Detail & Related papers (2022-06-05T18:38:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.