Related papers: Binary Classifier Optimization for Large Language Model Alignment

Binary Classifier Optimization for Large Language Model Alignment

URL: http://arxiv.org/abs/2404.04656v1
Date: Sat, 6 Apr 2024 15:20:59 GMT
Title: Binary Classifier Optimization for Large Language Model Alignment
Authors: Seungjae Jung, Gunsoo Han, Daniel Wontae Nam, Kyoung-Woon On,
Abstract summary: We present theoretical foundations to explain the successful alignment achieved through binary signals. We identify two techniques for effective alignment: reward shift and underlying distribution matching. Our model consistently demonstrates effective and robust alignment across two base LLMs and three different binary signal datasets.
Score: 4.61411484523337
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Aligning Large Language Models (LLMs) to human preferences through preference optimization has been crucial but labor-intensive, necessitating for each prompt a comparison of both a chosen and a rejected text completion by evaluators. Recently, Kahneman-Tversky Optimization (KTO) has demonstrated that LLMs can be aligned using merely binary "thumbs-up" or "thumbs-down" signals on each prompt-completion pair. In this paper, we present theoretical foundations to explain the successful alignment achieved through these binary signals. Our analysis uncovers a new perspective: optimizing a binary classifier, whose logit is a reward, implicitly induces minimizing the Direct Preference Optimization (DPO) loss. In the process of this discovery, we identified two techniques for effective alignment: reward shift and underlying distribution matching. Consequently, we propose a new algorithm, \textit{Binary Classifier Optimization}, that integrates the techniques. We validate our methodology in two settings: first, on a paired preference dataset, where our method performs on par with DPO and KTO; and second, on binary signal datasets simulating real-world conditions with divergent underlying distributions between thumbs-up and thumbs-down data. Our model consistently demonstrates effective and robust alignment across two base LLMs and three different binary signal datasets, showcasing the strength of our approach to learning from binary feedback.

Related papers

Multi-Preference Lambda-weighted Listwise DPO for Small-Scale Model Alignment [5.276657230880984]
Large language models (LLMs) demonstrate strong generalization across a wide range of language tasks, but often generate outputs that misalign with human preferences.<n>Direct Optimization Preference (DPO) simplifies the process by treating alignment as a classification task over binary preference pairs.<n>We propose Multi-Preference Lambda-weighted Listwise DPO, which allows the model to learn from more detailed human feedback.<n>Our method consistently outperforms standard DPO on alignment while enabling efficient, controllable, and fine-grained adaptation suitable for real-world deployment.
arXiv Detail & Related papers (2025-06-24T16:47:17Z)
Smoothed Preference Optimization via ReNoise Inversion for Aligning Diffusion Models with Varied Human Preferences [13.588231827053923]
Direct Preference Optimization (DPO) aligns text-to-image (T2I) generation models with human preferences using pairwise preference data.<n>We propose SmPO-Diffusion, a novel method for modeling preference distributions to improve the DPO objective.<n>Our approach effectively mitigates issues of excessive optimization and objective misalignment present in existing methods.
arXiv Detail & Related papers (2025-06-03T09:47:22Z)
Shallow Preference Signals: Large Language Model Aligns Even Better with Truncated Data? [34.18909976476456]
We show that the distinguishing signal obtained in preferred responses is often concentrated in the early tokens.<n>Surprisingly, models trained on truncated datasets, retaining only the first half or fewer tokens, achieve comparable or even superior performance to those trained on full datasets.<n>We consider two simple decoding strategies motivated by the shallow reward signal observation, namely Length Control Decoding and KL Threshold Control Decoding, which leverage shallow preference signals to optimize the trade-off between alignment and computational efficiency.
arXiv Detail & Related papers (2025-05-21T17:59:02Z)
Calibrated Multi-Preference Optimization for Aligning Diffusion Models [92.90660301195396]
Calibrated Preference Optimization (CaPO) is a novel method to align text-to-image (T2I) diffusion models. CaPO incorporates the general preference from multiple reward models without human annotated data. Experimental results show that CaPO consistently outperforms prior methods.
arXiv Detail & Related papers (2025-02-04T18:59:23Z)
Graph-Sequential Alignment and Uniformity: Toward Enhanced Recommendation Systems [51.716704243764994]
Our framework uses Graph Neural Network (GNN)-based and sequential recommenders as separate submodules while sharing a unified embedding space optimized jointly. Experiments on three real-world datasets demonstrate that the proposed method significantly outperforms using either approach alone.
arXiv Detail & Related papers (2024-12-05T15:59:05Z)
Self-supervised Preference Optimization: Enhance Your Language Model with Preference Degree Awareness [27.43137305486112]
We propose a novel Self-supervised Preference Optimization (SPO) framework, which constructs a self-supervised preference degree loss combined with the alignment loss. The results demonstrate that SPO can be seamlessly integrated with existing preference optimization methods to achieve state-of-the-art performance.
arXiv Detail & Related papers (2024-09-26T12:37:26Z)
Preference Alignment Improves Language Model-Based TTS [76.70693823683091]
preference alignment algorithms adjust LMs to align with the preferences of reward models, enhancing the desirability of the generated content. With a 1.15B parameter LM-based TTS model, we demonstrate that preference alignment consistently improves intelligibility, speaker similarity, and proxy subjective evaluation scores.
arXiv Detail & Related papers (2024-09-19T01:58:19Z)
Geometric-Averaged Preference Optimization for Soft Preference Labels [78.2746007085333]
Many algorithms for aligning LLMs with human preferences assume that human preferences are binary and deterministic.<n>In this work, we introduce the distributional soft preference labels and improve Direct Preference Optimization (DPO) with a weighted geometric average of the LLM output likelihood in the loss function.
arXiv Detail & Related papers (2024-09-10T17:54:28Z)
Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization [75.1240295759264]
We propose an effective framework for Bridging and Modeling Correlations in pairwise data, named BMC.<n>We increase the consistency and informativeness of the pairwise preference signals through targeted modifications.<n>We identify that DPO alone is insufficient to model these correlations and capture nuanced variations.
arXiv Detail & Related papers (2024-08-14T11:29:47Z)
Self-Augmented Preference Optimization: Off-Policy Paradigms for Language Model Alignment [104.18002641195442]
We introduce Self-Augmented Preference Optimization (SAPO), an effective and scalable training paradigm that does not require existing paired data. Building on the self-play concept, which autonomously generates negative responses, we further incorporate an off-policy learning pipeline to enhance data exploration and exploitation.
arXiv Detail & Related papers (2024-05-31T14:21:04Z)
Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts [95.09994361995389]
Relative Preference Optimization (RPO) is designed to discern between more and less preferred responses derived from both identical and related prompts. RPO has demonstrated a superior ability to align large language models with user preferences and to improve their adaptability during the training process.
arXiv Detail & Related papers (2024-02-12T22:47:57Z)
Beyond Imitation: Leveraging Fine-grained Quality Signals for Alignment [105.34140537748546]
We propose an improved alignment approach named FIGA. Different from prior methods, we incorporate fine-grained quality signals that are derived by contrasting good and bad responses. Our approach has made two major contributions. Firstly, we curate a refined alignment dataset that pairs initial responses and the corresponding revised ones. Secondly, we devise a new loss function can leverage fine-grained quality signals to instruct the learning of LLMs for alignment.
arXiv Detail & Related papers (2023-11-07T15:36:40Z)
Bidirectional Looking with A Novel Double Exponential Moving Average to Adaptive and Non-adaptive Momentum Optimizers [109.52244418498974]
We propose a novel textscAdmeta (textbfADouble exponential textbfMov averagtextbfE textbfAdaptive and non-adaptive momentum) framework. We provide two implementations, textscAdmetaR and textscAdmetaS, the former based on RAdam and the latter based on SGDM.
arXiv Detail & Related papers (2023-07-02T18:16:06Z)
FOSI: Hybrid First and Second Order Optimization [11.447526245792154]
We present FOSI, a novel metaalgorithm that improves the performance of any base first-order by efficiently incorporating second-order information during the optimization process. Our empirical evaluation demonstrates that FOSI improves the convergence rate and optimization time of first-order methods such as Heavy-Ball and Adam, and outperforms second-order methods (K-FAC and L-BFGS)
arXiv Detail & Related papers (2023-02-16T18:45:46Z)
DAG Learning on the Permutahedron [33.523216907730216]
We propose a continuous optimization framework for discovering a latent directed acyclic graph (DAG) from observational data. Our approach optimize over the polytope of permutation vectors, the so-called Permutahedron, to learn a topological ordering.
arXiv Detail & Related papers (2023-01-27T18:22:25Z)
AdaBin: Improving Binary Neural Networks with Adaptive Binary Sets [27.022212653067367]
This paper studies the Binary Neural Networks (BNNs) in which weights and activations are both binarized into 1-bit values. We present a simple yet effective approach called AdaBin to adaptively obtain the optimal binary sets. Experimental results on benchmark models and datasets demonstrate that the proposed AdaBin is able to achieve state-of-the-art performance.
arXiv Detail & Related papers (2022-08-17T05:43:33Z)
Bi-level Alignment for Cross-Domain Crowd Counting [113.78303285148041]
Current methods rely on external data for training an auxiliary task or apply an expensive coarse-to-fine estimation. We develop a new adversarial learning based method, which is simple and efficient to apply. We evaluate our approach on five real-world crowd counting benchmarks, where we outperform existing approaches by a large margin.
arXiv Detail & Related papers (2022-05-12T02:23:25Z)
On Second-order Optimization Methods for Federated Learning [59.787198516188425]
We evaluate the performance of several second-order distributed methods with local steps in the federated learning setting. We propose a novel variant that uses second-order local information for updates and a global line search to counteract the resulting local specificity.
arXiv Detail & Related papers (2021-09-06T12:04:08Z)
Domain Adaptive Person Re-Identification via Coupling Optimization [58.567492812339566]
Domain adaptive person Re-Identification (ReID) is challenging owing to the domain gap and shortage of annotations on target scenarios. This paper proposes a coupling optimization method including the Domain-Invariant Mapping (DIM) method and the Global-Local distance Optimization ( GLO) GLO is designed to train the ReID model with unsupervised setting on the target domain.
arXiv Detail & Related papers (2020-11-06T14:01:03Z)
Unsupervised Deep Cross-modality Spectral Hashing [65.3842441716661]
The framework is a two-step hashing approach which decouples the optimization into binary optimization and hashing function learning. We propose a novel spectral embedding-based algorithm to simultaneously learn single-modality and binary cross-modality representations. We leverage the powerful CNN for images and propose a CNN-based deep architecture to learn text modality.
arXiv Detail & Related papers (2020-08-01T09:20:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.