Related papers: A Probabilistic Approach for Model Alignment with Human Comparisons

A Probabilistic Approach for Model Alignment with Human Comparisons

URL: http://arxiv.org/abs/2403.10771v2
Date: Sat, 01 Feb 2025 21:28:10 GMT
Title: A Probabilistic Approach for Model Alignment with Human Comparisons
Authors: Junyu Cao, Mohsen Bayati,
Abstract summary: We develop a theoretical framework for analyzing the conditions under which human comparisons can enhance the traditional supervised learning process.<n>We propose a two-stage "Supervised Learning+Learning from Human Feedback" (SL+LHF) framework that connects machine learning with human feedback.
Score: 7.6656660956453635
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A growing trend involves integrating human knowledge into learning frameworks, leveraging subtle human feedback to refine AI models. While these approaches have shown promising results in practice, the theoretical understanding of when and why such approaches are effective remains limited. This work takes steps toward developing a theoretical framework for analyzing the conditions under which human comparisons can enhance the traditional supervised learning process. Specifically, this paper studies the effective use of noisy-labeled data and human comparison data to address challenges arising from noisy environment and high-dimensional models. We propose a two-stage "Supervised Learning+Learning from Human Feedback" (SL+LHF) framework that connects machine learning with human feedback through a probabilistic bisection approach. The two-stage framework first learns low-dimensional representations from noisy-labeled data via an SL procedure and then uses human comparisons to improve the model alignment. To examine the efficacy of the alignment phase, we introduce a concept, termed the "label-noise-to-comparison-accuracy" (LNCA) ratio. This paper identifies from a theoretical perspective the conditions under which the "SL+LHF" framework outperforms the pure SL approach; we then leverage this LNCA ratio to highlight the advantage of incorporating human evaluators in reducing sample complexity. We validate that the LNCA ratio meets the proposed conditions for its use through a case study conducted via Amazon Mechanical Turk (MTurk).

Related papers

Contextual Online Uncertainty-Aware Preference Learning for Human Feedback [13.478503755314344]
Reinforcement Learning from Human Feedback (RLHF) has become a pivotal paradigm in artificial intelligence. We propose a novel statistical framework to simultaneously conduct the online decision-making and statistical inference on the optimal model. We apply the proposed framework to analyze the human preference data for ranking large language models on the Massive Multitask Language Understanding dataset.
arXiv Detail & Related papers (2025-04-27T19:59:11Z)
CauSkelNet: Causal Representation Learning for Human Behaviour Analysis [6.880536510094897]
This study introduces a novel representation learning method based on causal inference to better understand human joint dynamics and complex behaviors. Our approach advances human motion analysis and paves the way for more adaptive intelligent healthcare solutions.
arXiv Detail & Related papers (2024-09-23T21:38:49Z)
Advancing Semantic Textual Similarity Modeling: A Regression Framework with Translated ReLU and Smooth K2 Loss [3.435381469869212]
This paper presents an innovative regression framework for Sentence-BERT STS tasks. It proposes two simple yet effective loss functions: Translated ReLU and Smooth K2 Loss. Experimental results demonstrate that our method achieves convincing performance across seven established STS benchmarks.
arXiv Detail & Related papers (2024-06-08T02:52:43Z)
Using Self-supervised Learning Can Improve Model Fairness [10.028637666224093]
Self-supervised learning (SSL) has become the de facto training paradigm of large models. This study explores the impact of pre-training and fine-tuning strategies on fairness. We introduce a fairness assessment framework for SSL, comprising five stages: defining dataset requirements, pre-training, fine-tuning with gradual unfreezing, assessing representation similarity conditioned on demographics, and establishing domain-specific evaluation processes.
arXiv Detail & Related papers (2024-06-04T14:38:30Z)
Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment [65.15914284008973]
We propose to leverage an Inverse Reinforcement Learning (IRL) technique to simultaneously build an reward model and a policy model. We show that the proposed algorithms converge to the stationary solutions of the IRL problem. Our results indicate that it is beneficial to leverage reward learning throughout the entire alignment process.
arXiv Detail & Related papers (2024-05-28T07:11:05Z)
A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime. We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
arXiv Detail & Related papers (2023-11-13T01:48:08Z)
Counterfactual Data Augmentation with Contrastive Learning [27.28511396131235]
We introduce a model-agnostic data augmentation method that imputes the counterfactual outcomes for a selected subset of individuals. We use contrastive learning to learn a representation space and a similarity measure such that in the learned representation space close individuals identified by the learned similarity measure have similar potential outcomes. This property ensures reliable imputation of counterfactual outcomes for the individuals with close neighbors from the alternative treatment group.
arXiv Detail & Related papers (2023-11-07T00:36:51Z)
SALMON: Self-Alignment with Instructable Reward Models [80.83323636730341]
This paper presents a novel approach, namely SALMON, to align base language models with minimal human supervision. We develop an AI assistant named Dromedary-2 with only 6 exemplars for in-context learning and 31 human-defined principles.
arXiv Detail & Related papers (2023-10-09T17:56:53Z)
TWINS: A Fine-Tuning Framework for Improved Transferability of Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks. We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework. TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z)
GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP, and Beyond [101.5329678997916]
We study sample efficient reinforcement learning (RL) under the general framework of interactive decision making. We propose a novel complexity measure, generalized eluder coefficient (GEC), which characterizes the fundamental tradeoff between exploration and exploitation. We show that RL problems with low GEC form a remarkably rich class, which subsumes low Bellman eluder dimension problems, bilinear class, low witness rank problems, PO-bilinear class, and generalized regular PSR.
arXiv Detail & Related papers (2022-11-03T16:42:40Z)
MaxMatch: Semi-Supervised Learning with Worst-Case Consistency [149.03760479533855]
We propose a worst-case consistency regularization technique for semi-supervised learning (SSL) We present a generalization bound for SSL consisting of the empirical loss terms observed on labeled and unlabeled training data separately. Motivated by this bound, we derive an SSL objective that minimizes the largest inconsistency between an original unlabeled sample and its multiple augmented variants.
arXiv Detail & Related papers (2022-09-26T12:04:49Z)
Learnability of Competitive Threshold Models [11.005966612053262]
We study the learnability of the competitive threshold model from a theoretical perspective. We demonstrate how competitive threshold models can be seamlessly simulated by artificial neural networks.
arXiv Detail & Related papers (2022-05-08T01:11:51Z)
A Unified Contrastive Energy-based Model for Understanding the Generative Ability of Adversarial Training [64.71254710803368]
Adversarial Training (AT) is an effective approach to enhance the robustness of deep neural networks. We demystify this phenomenon by developing a unified probabilistic framework, called Contrastive Energy-based Models (CEM) We propose a principled method to develop adversarial learning and sampling methods.
arXiv Detail & Related papers (2022-03-25T05:33:34Z)
The Efficacy of Pessimism in Asynchronous Q-Learning [17.193902915070506]
We develop an algorithmic framework that incorporates the principle of pessimism into asynchronous Q-learning. This framework leads to, among other things, improved sample efficiency and enhanced adaptivity in the presence of near-expert data. Our results deliver the first theoretical support for the use of pessimism principle in the presence of Markovian non-i.i.d. data.
arXiv Detail & Related papers (2022-03-14T17:59:01Z)
MINIMALIST: Mutual INformatIon Maximization for Amortized Likelihood Inference from Sampled Trajectories [61.3299263929289]
Simulation-based inference enables learning the parameters of a model even when its likelihood cannot be computed in practice. One class of methods uses data simulated with different parameters to infer an amortized estimator for the likelihood-to-evidence ratio. We show that this approach can be formulated in terms of mutual information between model parameters and simulated data.
arXiv Detail & Related papers (2021-06-03T12:59:16Z)
Scalable Personalised Item Ranking through Parametric Density Estimation [53.44830012414444]
Learning from implicit feedback is challenging because of the difficult nature of the one-class problem. Most conventional methods use a pairwise ranking approach and negative samplers to cope with the one-class problem. We propose a learning-to-rank approach, which achieves convergence speed comparable to the pointwise counterpart.
arXiv Detail & Related papers (2021-05-11T03:38:16Z)
DEALIO: Data-Efficient Adversarial Learning for Imitation from Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator. Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms. This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk. We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z)
Joint Stochastic Approximation and Its Application to Learning Discrete Latent Variable Models [19.07718284287928]
We show that the difficulty of obtaining reliable gradients for the inference model and the drawback of indirectly optimizing the target log-likelihood can be gracefully addressed. We propose to directly maximize the target log-likelihood and simultaneously minimize the inclusive divergence between the posterior and the inference model. The resulting learning algorithm is called joint SA (JSA)
arXiv Detail & Related papers (2020-05-28T13:50:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.