Related papers: Granular feedback merits sophisticated aggregation

Granular feedback merits sophisticated aggregation

URL: http://arxiv.org/abs/2507.12041v1
Date: Wed, 16 Jul 2025 08:58:27 GMT
Title: Granular feedback merits sophisticated aggregation
Authors: Anmol Kagrecha, Henrik Marklund, Potsawee Manakul, Richard Zeckhauser, Benjamin Van Roy,
Abstract summary: We show that, as feedback granularity increases, one can substantially improve upon predictions of regularized averaging.<n>In particular, with binary feedback, sophistication barely reduces the number of individuals required to attain a fixed level of performance.
Score: 27.268860235599973
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Human feedback is increasingly used across diverse applications like training AI models, developing recommender systems, and measuring public opinion -- with granular feedback often being preferred over binary feedback for its greater informativeness. While it is easy to accurately estimate a population's distribution of feedback given feedback from a large number of individuals, cost constraints typically necessitate using smaller groups. A simple method to approximate the population distribution is regularized averaging: compute the empirical distribution and regularize it toward a prior. Can we do better? As we will discuss, the answer to this question depends on feedback granularity. Suppose one wants to predict a population's distribution of feedback using feedback from a limited number of individuals. We show that, as feedback granularity increases, one can substantially improve upon predictions of regularized averaging by combining individuals' feedback in ways more sophisticated than regularized averaging. Our empirical analysis using questions on social attitudes confirms this pattern. In particular, with binary feedback, sophistication barely reduces the number of individuals required to attain a fixed level of performance. By contrast, with five-point feedback, sophisticated methods match the performance of regularized averaging with about half as many individuals.

Related papers

ProgRoCC: A Progressive Approach to Rough Crowd Counting [66.09510514180593]
We label Rough Crowd Counting that delivers better accuracy on the basis of training data that is easier to acquire.<n>We propose an approach to the rough crowd counting problem based on CLIP, termed ProgRoCC.<n>Specifically, we introduce a progressive estimation learning strategy that determines the object count through a coarse-to-fine approach.
arXiv Detail & Related papers (2025-04-18T01:57:42Z)
Variational Bayesian Personalized Ranking [39.24591060825056]
Variational BPR is a novel and easily implementable learning objective that integrates likelihood optimization, noise reduction, and popularity debiasing.<n>We introduce an attention-based latent interest prototype contrastive mechanism, replacing instance-level contrastive learning, to effectively reduce noise from problematic samples.<n> Empirically, we demonstrate the effectiveness of Variational BPR on popular backbone recommendation models.
arXiv Detail & Related papers (2025-03-14T04:22:01Z)
Adaptive Querying for Reward Learning from Human Feedback [5.587293092389789]
Learning from human feedback is a popular approach to train robots to adapt to user preferences and improve safety.<n>We examine how to learn a penalty function associated with unsafe behaviors, such as side effects, using multiple forms of human feedback.<n>We employ an iterative, two-phase approach which first selects critical states for querying, and then uses information gain to select a feedback format for querying.
arXiv Detail & Related papers (2024-12-11T00:02:48Z)
Reward Modeling with Ordinal Feedback: Wisdom of the Crowd [9.034189257088762]
Learning a reward model (RM) from human preferences has been an important component in aligning large language models. We propose a framework for learning RMs under ordinal feedback. We prove the statistical benefits of ordinal feedback in terms of reducing the Rademacher complexity.
arXiv Detail & Related papers (2024-11-19T20:17:04Z)
ComPO: Community Preferences for Language Model Personalization [122.54846260663922]
ComPO is a method to personalize preference optimization in language models. We collect and release ComPRed, a question answering dataset with community-level preferences from Reddit.
arXiv Detail & Related papers (2024-10-21T14:02:40Z)
Beyond Thumbs Up/Down: Untangling Challenges of Fine-Grained Feedback for Text-to-Image Generation [67.88747330066049]
Fine-grained feedback captures nuanced distinctions in image quality and prompt-alignment. We show that demonstrating its superiority to coarse-grained feedback is not automatic. We identify key challenges in eliciting and utilizing fine-grained feedback.
arXiv Detail & Related papers (2024-06-24T17:19:34Z)
Crowd-PrefRL: Preference-Based Reward Learning from Crowds [0.4439066410935887]
We introduce a conceptual framework, Crowd-PrefRL, that integrates preference-based reinforcement learning approaches with crowdsourcing techniques.<n>Preliminary results suggest that Crowd-PrefRL can learn reward functions and agent policies from preference feedback provided by crowds of unknown expertise and reliability.<n>Results suggest that our method can identify the presence of minority viewpoints within the crowd in an unsupervised manner.
arXiv Detail & Related papers (2024-01-17T18:06:17Z)
Correcting the User Feedback-Loop Bias for Recommendation Systems [34.44834423714441]
We propose a systematic and dynamic way to correct user feedback-loop bias in recommendation systems. Our method includes a deep-learning component to learn each user's dynamic rating history embedding. We empirically validated the existence of such user feedback-loop bias in real world recommendation systems.
arXiv Detail & Related papers (2021-09-13T15:02:55Z)
Universal Off-Policy Evaluation [64.02853483874334]
We take the first steps towards a universal off-policy estimator (UnO) We use UnO for estimating and simultaneously bounding the mean, variance, quantiles/median, inter-quantile range, CVaR, and the entire cumulative distribution of returns.
arXiv Detail & Related papers (2021-04-26T18:54:31Z)
Partial Bandit and Semi-Bandit: Making the Most Out of Scarce Users' Feedback [62.997667081978825]
We present a novel approach for considering user feedback and evaluate it using three distinct strategies. Despite a limited number of feedbacks returned by users (as low as 20% of the total), our approach obtains similar results to those of state of the art approaches.
arXiv Detail & Related papers (2020-09-16T07:32:51Z)
Dialogue Response Ranking Training with Large-Scale Human Feedback Data [52.12342165926226]
We leverage social media feedback data to build a large-scale training dataset for feedback prediction. We trained DialogRPT, a set of GPT-2 based models on 133M pairs of human feedback data. Our ranker outperforms the conventional dialog perplexity baseline with a large margin on predicting Reddit feedback.
arXiv Detail & Related papers (2020-09-15T10:50:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.