Related papers: Curriculum-DPO++: Direct Preference Optimization via Data and Model Curricula for Text-to-Image Generation

Curriculum-DPO++: Direct Preference Optimization via Data and Model Curricula for Text-to-Image Generation

URL: http://arxiv.org/abs/2602.13055v1
Date: Fri, 13 Feb 2026 16:09:31 GMT
Title: Curriculum-DPO++: Direct Preference Optimization via Data and Model Curricula for Text-to-Image Generation
Authors: Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, Nicu Sebe, Mubarak Shah,
Abstract summary: We introduce Curriculum-DPO, a method that organizes image pairs by difficulty.<n>We propose to dynamically increase the learning capacity of the denoising network as training advances.
Score: 103.29651633424855
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Direct Preference Optimization (DPO) has been proposed as an effective and efficient alternative to reinforcement learning from human feedback (RLHF). However, neither RLHF nor DPO take into account the fact that learning certain preferences is more difficult than learning other preferences, rendering the optimization process suboptimal. To address this gap in text-to-image generation, we recently proposed Curriculum-DPO, a method that organizes image pairs by difficulty. In this paper, we introduce Curriculum-DPO++, an enhanced method that combines the original data-level curriculum with a novel model-level curriculum. More precisely, we propose to dynamically increase the learning capacity of the denoising network as training advances. We implement this capacity increase via two mechanisms. First, we initialize the model with only a subset of the trainable layers used in the original Curriculum-DPO. As training progresses, we sequentially unfreeze layers until the configuration matches the full baseline architecture. Second, as the fine-tuning is based on Low-Rank Adaptation (LoRA), we implement a progressive schedule for the dimension of the low-rank matrices. Instead of maintaining a fixed capacity, we initialize the low-rank matrices with a dimension significantly smaller than that of the baseline. As training proceeds, we incrementally increase their rank, allowing the capacity to grow until it converges to the same rank value as in Curriculum-DPO. Furthermore, we propose an alternative ranking strategy to the one employed by Curriculum-DPO. Finally, we compare Curriculum-DPO++ against Curriculum-DPO and other state-of-the-art preference optimization approaches on nine benchmarks, outperforming the competing methods in terms of text alignment, aesthetics and human preference. Our code is available at https://github.com/CroitoruAlin/Curriculum-DPO.

Related papers

APAO: Adaptive Prefix-Aware Optimization for Generative Recommendation [26.371939617653084]
Generative recommendation is an autoregressive generation process, predicting discrete tokens of the next item conditioned on user interaction histories.<n>Existing generative recommendation models are typically trained with token-level likelihood objectives, such as cross-entropy loss.<n>This leads to a training-inference inconsistency: standard training assumes ground-truth history is always available, ignoring the fact that beam search prunes low-probability branches during inference.
arXiv Detail & Related papers (2026-03-03T08:29:15Z)
RankPO: Preference Optimization for Job-Talent Matching [7.385902340910447]
We propose a two-stage training framework for large language models (LLMs)<n>In the first stage, a contrastive learning approach is used to train the model on a dataset constructed from real-world matching rules.<n>In the second stage, we introduce a novel preference-based fine-tuning method inspired by Direct Preference Optimization (DPO) to align the model with AI-curated pairwise preferences.
arXiv Detail & Related papers (2025-03-13T10:14:37Z)
Active Learning for Direct Preference Optimization [59.84525302418018]
Direct preference optimization (DPO) is a form of reinforcement learning from human feedback.<n>We propose an active learning framework for DPO, which can be applied to collect human feedback online or to choose the most informative subset of already collected feedback offline.
arXiv Detail & Related papers (2025-03-03T00:36:31Z)
Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization [44.008094698200026]
We propose a new training paradigm termed Direct CLIP-Based Optimization (DiCO) Our approach jointly learns and optimize a reward model that is distilled from a learnable captioning evaluator with high human correlation. DiCO not only exhibits improved stability and enhanced quality in the generated captions but also aligns more closely with human preferences compared to existing methods.
arXiv Detail & Related papers (2024-08-26T18:00:33Z)
Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization [77.62516752323207]
We introduce an orthogonal fine-tuning method for efficiently fine-tuning pretrained weights and enabling enhanced robustness and generalization. A self-regularization strategy is further exploited to maintain the stability in terms of zero-shot generalization of VLMs, dubbed OrthSR. For the first time, we revisit the CLIP and CoOp with our method to effectively improve the model on few-shot image classficiation scenario.
arXiv Detail & Related papers (2024-07-11T10:35:53Z)
Curriculum Direct Preference Optimization for Diffusion and Consistency Models [110.08057135882356]
We propose a novel and enhanced version of DPO based on curriculum learning for text-to-image generation.<n>Our approach, Curriculum DPO, is compared against state-of-the-art fine-tuning approaches on nine benchmarks.
arXiv Detail & Related papers (2024-05-22T13:36:48Z)
LiPO: Listwise Preference Optimization through Learning-to-Rank [62.02782819559389]
Policy can learn more effectively from a ranked list of plausible responses given the prompt.<n>We show that LiPO-$lambda$ can outperform DPO variants and SLiC by a clear margin on several preference alignment tasks.
arXiv Detail & Related papers (2024-02-02T20:08:10Z)
Ranking Creative Language Characteristics in Small Data Scenarios [52.00161818003478]
We adapt the DirectRanker to provide a new deep model for ranking creative language with small data. Our experiments with sparse training data show that while the performance of standard neural ranking approaches collapses with small datasets, DirectRanker remains effective.
arXiv Detail & Related papers (2020-10-23T18:57:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.