Related papers: Users as Annotators: LLM Preference Learning from Comparison Mode

Users as Annotators: LLM Preference Learning from Comparison Mode

URL: http://arxiv.org/abs/2510.13830v1
Date: Fri, 10 Oct 2025 08:57:34 GMT
Title: Users as Annotators: LLM Preference Learning from Comparison Mode
Authors: Zhongze Cai, Xiaocheng Li,
Abstract summary: We consider an alternative approach to collect pairwise preference data -- user annotation from comparison mode.<n>We develop an expectation-maximization algorithm to estimate a latent quality factor of the user.
Score: 9.005226538625474
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Pairwise preference data have played an important role in the alignment of large language models (LLMs). Each sample of such data consists of a prompt, two different responses to the prompt, and a binary label indicating which of the two responses is better. The labels are usually annotated by professional human annotators. In this paper, we consider an alternative approach to collect pairwise preference data -- user annotation from comparison mode. With the increasingly wider adoption of LLMs among the population, users are contributing more and more of their preference labels through their daily interactions with the LLMs. The upside of such labels is that users are the best experts in judging the responses to their own queries/prompts, but the downside is the lack of quality control in these labels. In this paper, we consider a new idea of generating two responses from two different models or two different versions of the same model. The asymmetry allows us to make an inference of the user's data quality through our proposed user behavior model. We develop an expectation-maximization algorithm to estimate a latent quality factor of the user, and filter users' annotation data accordingly. The downstream task shows the effectiveness of our approach in both capturing the user behavior and data filtering for LLM alignment.

Related papers

Towards Realistic Personalization: Evaluating Long-Horizon Preference Following in Personalized User-LLM Interactions [50.70965714314064]
Large Language Models (LLMs) are increasingly serving as personal assistants, where users share complex and diverse preferences over extended interactions.<n>This work proposes RealPref, a benchmark for evaluating realistic preference-following in personalized user-LLM interactions.
arXiv Detail & Related papers (2026-03-04T15:42:43Z)
Can LLM Annotations Replace User Clicks for Learning to Rank? [112.2254432364736]
Large-scale supervised data is essential for training modern ranking models, but obtaining high-quality human annotations is costly.<n>Click data has been widely used as a low-cost alternative, and with recent advances in large language models (LLMs), LLM-based relevance annotation has emerged as another promising annotation.<n> Experiments on both a public dataset, TianGong-ST, and an industrial dataset, Baidu-Click, show that click-supervised models perform better on high-frequency queries.<n>We explore two training strategies -- data scheduling and frequency-aware multi-objective learning -- that integrate both supervision signals.
arXiv Detail & Related papers (2025-11-10T02:26:14Z)
Beyond Single Labels: Improving Conversational Recommendation through LLM-Powered Data Augmentation [18.01518720663732]
Conversational recommender systems (CRSs) enhance recommendation quality by engaging users in multi-turn dialogues.<n>CRSs often face the false negative issue, where items that a user might like are incorrectly labeled as negative during training, leading to suboptimal recommendations.<n>We propose a novel data augmentation framework that first leverages an LLM-based semantic retriever to identify diverse and semantically relevant items.
arXiv Detail & Related papers (2025-07-30T08:20:54Z)
LLM-Driven Dual-Level Multi-Interest Modeling for Recommendation [12.89199121698673]
Large language models (LLMs) show significant potential for multi-interest analysis due to their extensive knowledge and powerful reasoning capabilities.<n>We propose an LLM-driven dual-level multi-interest modeling framework for more effective recommendation.<n> Experiments on real-world datasets show the superiority of our approach against state-of-the-art methods.
arXiv Detail & Related papers (2025-07-15T02:13:54Z)
Multi-agents based User Values Mining for Recommendation [52.26100802380767]
We propose a zero-shot multi-LLM collaborative framework for effective and accurate user value extraction.<n>We apply text summarization techniques to condense item content while preserving essential meaning.<n>To mitigate hallucinations, we introduce two specialized agent roles: evaluators and supervisors.
arXiv Detail & Related papers (2025-05-02T04:01:31Z)
HyPerAlign: Interpretable Personalized LLM Alignment via Hypothesis Generation [24.67727411391369]
HyPerAlign is an interpretable and sample-efficient hypothesis-driven personalization approach for large language models.<n>We conduct experiments on two different personalization tasks, namely authorship attribution and deliberative alignment.<n>Results demonstrate the superiority of hypothesis-driven personalization compared to preference-based fine-tuning methods.
arXiv Detail & Related papers (2025-04-29T18:01:46Z)
Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling and Personalized Responses at Scale [51.9706400130481]
Large Language Models (LLMs) have emerged as personalized assistants for users across a wide range of tasks.<n> PERSONAMEM features curated user profiles with over 180 simulated user-LLM interaction histories.<n>We evaluate LLM chatbots' ability to identify the most suitable response according to the current state of the user's profile.
arXiv Detail & Related papers (2025-04-19T08:16:10Z)
AdaptRec: A Self-Adaptive Framework for Sequential Recommendations with Large Language Models [10.52052172996229]
AdaptRec is a self-adaptive fram-ework that leverages Large Language Models for sequential recommendations by incorporating explicit collaborative signals.<n>We develop a User-Contextualized Recommendation Prompt that translates their behavior sequences into natural language, explicitly integrating this information into the recommendation process.<n>Experiments demonstrate AdaptRec's superior performance, with significant improvements in HitRatio@1 scores of 7.13%, 18.16%, and 10.41% across real-world datasets.
arXiv Detail & Related papers (2025-04-06T00:30:50Z)
Beyond the Binary: Capturing Diverse Preferences With Reward Regularization [15.518838657050173]
We argue that this reliance on binary choices does not capture the broader, aggregate preferences of the target user in real-world tasks.<n>We introduce a simple yet effective method that augments existing binary preference datasets with synthetic preference judgments to estimate potential user disagreement.
arXiv Detail & Related papers (2024-12-05T02:35:46Z)
Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts [95.09994361995389]
Relative Preference Optimization (RPO) is designed to discern between more and less preferred responses derived from both identical and related prompts. RPO has demonstrated a superior ability to align large language models with user preferences and to improve their adaptability during the training process.
arXiv Detail & Related papers (2024-02-12T22:47:57Z)
On the steerability of large language models toward data-driven personas [98.9138902560793]
Large language models (LLMs) are known to generate biased responses where the opinions of certain groups and populations are underrepresented. Here, we present a novel approach to achieve controllable generation of specific viewpoints using LLMs.
arXiv Detail & Related papers (2023-11-08T19:01:13Z)
The Minority Matters: A Diversity-Promoting Collaborative Metric Learning Algorithm [154.47590401735323]
Collaborative Metric Learning (CML) has recently emerged as a popular method in recommendation systems. This paper focuses on a challenging scenario where a user has multiple categories of interests. We propose a novel method called textitDiversity-Promoting Collaborative Metric Learning (DPCML)
arXiv Detail & Related papers (2022-09-30T08:02:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.