A Probabilistic Approach for Model Alignment with Human Comparisons
- URL: http://arxiv.org/abs/2403.10771v2
- Date: Sat, 01 Feb 2025 21:28:10 GMT
- Title: A Probabilistic Approach for Model Alignment with Human Comparisons
- Authors: Junyu Cao, Mohsen Bayati,
- Abstract summary: We develop a theoretical framework for analyzing the conditions under which human comparisons can enhance the traditional supervised learning process.
We propose a two-stage "Supervised Learning+Learning from Human Feedback" (SL+LHF) framework that connects machine learning with human feedback.
- Score: 7.6656660956453635
- License:
- Abstract: A growing trend involves integrating human knowledge into learning frameworks, leveraging subtle human feedback to refine AI models. While these approaches have shown promising results in practice, the theoretical understanding of when and why such approaches are effective remains limited. This work takes steps toward developing a theoretical framework for analyzing the conditions under which human comparisons can enhance the traditional supervised learning process. Specifically, this paper studies the effective use of noisy-labeled data and human comparison data to address challenges arising from noisy environment and high-dimensional models. We propose a two-stage "Supervised Learning+Learning from Human Feedback" (SL+LHF) framework that connects machine learning with human feedback through a probabilistic bisection approach. The two-stage framework first learns low-dimensional representations from noisy-labeled data via an SL procedure and then uses human comparisons to improve the model alignment. To examine the efficacy of the alignment phase, we introduce a concept, termed the "label-noise-to-comparison-accuracy" (LNCA) ratio. This paper identifies from a theoretical perspective the conditions under which the "SL+LHF" framework outperforms the pure SL approach; we then leverage this LNCA ratio to highlight the advantage of incorporating human evaluators in reducing sample complexity. We validate that the LNCA ratio meets the proposed conditions for its use through a case study conducted via Amazon Mechanical Turk (MTurk).
Related papers
- CauSkelNet: Causal Representation Learning for Human Behaviour Analysis [6.880536510094897]
This study introduces a novel representation learning method based on causal inference to better understand human joint dynamics and complex behaviors.
Our approach advances human motion analysis and paves the way for more adaptive intelligent healthcare solutions.
arXiv Detail & Related papers (2024-09-23T21:38:49Z) - Using Self-supervised Learning Can Improve Model Fairness [10.028637666224093]
Self-supervised learning (SSL) has become the de facto training paradigm of large models.
This study explores the impact of pre-training and fine-tuning strategies on fairness.
We introduce a fairness assessment framework for SSL, comprising five stages: defining dataset requirements, pre-training, fine-tuning with gradual unfreezing, assessing representation similarity conditioned on demographics, and establishing domain-specific evaluation processes.
arXiv Detail & Related papers (2024-06-04T14:38:30Z) - Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment [65.15914284008973]
We propose to leverage an Inverse Reinforcement Learning (IRL) technique to simultaneously build an reward model and a policy model.
We show that the proposed algorithms converge to the stationary solutions of the IRL problem.
Our results indicate that it is beneficial to leverage reward learning throughout the entire alignment process.
arXiv Detail & Related papers (2024-05-28T07:11:05Z) - SALMON: Self-Alignment with Instructable Reward Models [80.83323636730341]
This paper presents a novel approach, namely SALMON, to align base language models with minimal human supervision.
We develop an AI assistant named Dromedary-2 with only 6 exemplars for in-context learning and 31 human-defined principles.
arXiv Detail & Related papers (2023-10-09T17:56:53Z) - GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP,
and Beyond [101.5329678997916]
We study sample efficient reinforcement learning (RL) under the general framework of interactive decision making.
We propose a novel complexity measure, generalized eluder coefficient (GEC), which characterizes the fundamental tradeoff between exploration and exploitation.
We show that RL problems with low GEC form a remarkably rich class, which subsumes low Bellman eluder dimension problems, bilinear class, low witness rank problems, PO-bilinear class, and generalized regular PSR.
arXiv Detail & Related papers (2022-11-03T16:42:40Z) - Learnability of Competitive Threshold Models [11.005966612053262]
We study the learnability of the competitive threshold model from a theoretical perspective.
We demonstrate how competitive threshold models can be seamlessly simulated by artificial neural networks.
arXiv Detail & Related papers (2022-05-08T01:11:51Z) - A Unified Contrastive Energy-based Model for Understanding the
Generative Ability of Adversarial Training [64.71254710803368]
Adversarial Training (AT) is an effective approach to enhance the robustness of deep neural networks.
We demystify this phenomenon by developing a unified probabilistic framework, called Contrastive Energy-based Models (CEM)
We propose a principled method to develop adversarial learning and sampling methods.
arXiv Detail & Related papers (2022-03-25T05:33:34Z) - Scalable Personalised Item Ranking through Parametric Density Estimation [53.44830012414444]
Learning from implicit feedback is challenging because of the difficult nature of the one-class problem.
Most conventional methods use a pairwise ranking approach and negative samplers to cope with the one-class problem.
We propose a learning-to-rank approach, which achieves convergence speed comparable to the pointwise counterpart.
arXiv Detail & Related papers (2021-05-11T03:38:16Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - Domain Knowledge Integration By Gradient Matching For Sample-Efficient
Reinforcement Learning [0.0]
We propose a gradient matching algorithm to improve sample efficiency by utilizing target slope information from the dynamics to aid the model-free learner.
We demonstrate this by presenting a technique for matching the gradient information from the model-based learner with the model-free component in an abstract low-dimensional space.
arXiv Detail & Related papers (2020-05-28T05:02:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.