CPO: Condition Preference Optimization for Controllable Image Generation
- URL: http://arxiv.org/abs/2511.04753v1
- Date: Thu, 06 Nov 2025 19:02:06 GMT
- Title: CPO: Condition Preference Optimization for Controllable Image Generation
- Authors: Zonglin Lyu, Ming Li, Xinxin Liu, Chen Chen,
- Abstract summary: ControlNet introduces image-based control signals in text-to-image generation.<n>ControlNet++ improves pixel-level cycle consistency between generated images and the input control signal.<n>We propose performing preference learning over control conditions rather than generated images.
- Score: 9.511718254502199
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: To enhance controllability in text-to-image generation, ControlNet introduces image-based control signals, while ControlNet++ improves pixel-level cycle consistency between generated images and the input control signal. To avoid the prohibitive cost of back-propagating through the sampling process, ControlNet++ optimizes only low-noise timesteps (e.g., $t < 200$) using a single-step approximation, which not only ignores the contribution of high-noise timesteps but also introduces additional approximation errors. A straightforward alternative for optimizing controllability across all timesteps is Direct Preference Optimization (DPO), a fine-tuning method that increases model preference for more controllable images ($I^{w}$) over less controllable ones ($I^{l}$). However, due to uncertainty in generative models, it is difficult to ensure that win--lose image pairs differ only in controllability while keeping other factors, such as image quality, fixed. To address this, we propose performing preference learning over control conditions rather than generated images. Specifically, we construct winning and losing control signals, $\mathbf{c}^{w}$ and $\mathbf{c}^{l}$, and train the model to prefer $\mathbf{c}^{w}$. This method, which we term \textit{Condition Preference Optimization} (CPO), eliminates confounding factors and yields a low-variance training objective. Our approach theoretically exhibits lower contrastive loss variance than DPO and empirically achieves superior results. Moreover, CPO requires less computation and storage for dataset curation. Extensive experiments show that CPO significantly improves controllability over the state-of-the-art ControlNet++ across multiple control types: over $10\%$ error rate reduction in segmentation, $70$--$80\%$ in human pose, and consistent $2$--$5\%$ reductions in edge and depth maps.
Related papers
- Learn to Guide Your Diffusion Model [84.82855046749657]
We study a technique for improving quality of samples from conditional diffusion models.<n>We learn guidance weights $omega_c,(s,t)$, which are functions of the conditioning $c$, the time $t$ from which we denoise, and the time $s$ towards which we denoise.<n>We extend our framework to reward guided sampling, enabling the model to target distributions tilted by a reward function.
arXiv Detail & Related papers (2025-10-01T12:21:48Z) - Distortion of AI Alignment: Does Preference Optimization Optimize for Preferences? [20.004349891563706]
After pre-training, large language models are aligned with human preferences based on pairwise comparisons.<n>We introduce an alignment method's distortion: the worst-case ratio between the optimal achievable average utility, and the average utility of the learned policy.
arXiv Detail & Related papers (2025-05-29T17:59:20Z) - Group Relative Policy Optimization for Image Captioning [1.9606373630214207]
We propose using the latest Group Relative Policy Optimization (GRPO) reinforcement learning algorithm as an optimization solution for the second stage.<n>By constraining the amplitude of policy updates and KL divergence, the stability of the model during training is greatly guaranteed.
arXiv Detail & Related papers (2025-03-03T09:16:41Z) - ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback [20.910939141948123]
ControlNet++ is a novel approach that improves controllable generation by explicitly optimizing pixel-level cycle consistency between generated images and conditional controls.
It achieves improvements over ControlNet by 11.1% mIoU, 13.4% SSIM, and 7.6% RMSE, respectively, for segmentation mask, line-art edge, and depth conditions.
arXiv Detail & Related papers (2024-04-11T17:59:09Z) - Stochastic Optimal Control Matching [53.156277491861985]
Our work introduces Optimal Control Matching (SOCM), a novel Iterative Diffusion Optimization (IDO) technique for optimal control.
The control is learned via a least squares problem by trying to fit a matching vector field.
Experimentally, our algorithm achieves lower error than all the existing IDO techniques for optimal control.
arXiv Detail & Related papers (2023-12-04T16:49:43Z) - Expressive Losses for Verified Robustness via Convex Combinations [67.54357965665676]
We study the relationship between the over-approximation coefficient and performance profiles across different expressive losses.
We show that, while expressivity is essential, better approximations of the worst-case loss are not necessarily linked to superior robustness-accuracy trade-offs.
arXiv Detail & Related papers (2023-05-23T12:20:29Z) - Follow the Clairvoyant: an Imitation Learning Approach to Optimal
Control [4.978565634673048]
We consider control of dynamical systems through the lens of competitive analysis.
Motivated by the observation that the optimal cost only provides coarse information about the ideal closed-loop behavior, we propose directly minimizing the tracking error.
arXiv Detail & Related papers (2022-11-14T14:15:12Z) - Normalized/Clipped SGD with Perturbation for Differentially Private
Non-Convex Optimization [94.06564567766475]
DP-SGD and DP-NSGD mitigate the risk of large models memorizing sensitive training data.
We show that these two algorithms achieve similar best accuracy while DP-NSGD is comparatively easier to tune than DP-SGD.
arXiv Detail & Related papers (2022-06-27T03:45:02Z) - Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free
Reinforcement Learning [52.76230802067506]
A novel model-free algorithm is proposed to minimize regret in episodic reinforcement learning.
The proposed algorithm employs an em early-settled reference update rule, with the aid of two Q-learning sequences.
The design principle of our early-settled variance reduction method might be of independent interest to other RL settings.
arXiv Detail & Related papers (2021-10-09T21:13:48Z) - On Measuring and Controlling the Spectral Bias of the Deep Image Prior [63.88575598930554]
The deep image prior has demonstrated the remarkable ability that untrained networks can address inverse imaging problems.
It requires an oracle to determine when to stop the optimization as the performance degrades after reaching a peak.
We study the deep image prior from a spectral bias perspective to address these problems.
arXiv Detail & Related papers (2021-07-02T15:10:42Z) - Permuted AdaIN: Reducing the Bias Towards Global Statistics in Image
Classification [97.81205777897043]
Recent work has shown that convolutional neural network classifiers overly rely on texture at the expense of shape cues.
We make a similar but different distinction between shape and local image cues, on the one hand, and global image statistics, on the other.
Our method, called Permuted Adaptive Instance Normalization (pAdaIN), reduces the representation of global statistics in the hidden layers of image classifiers.
arXiv Detail & Related papers (2020-10-09T16:38:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.