DSPO: Direct Semantic Preference Optimization for Real-World Image Super-Resolution
- URL: http://arxiv.org/abs/2504.15176v1
- Date: Mon, 21 Apr 2025 15:35:48 GMT
- Title: DSPO: Direct Semantic Preference Optimization for Real-World Image Super-Resolution
- Authors: Miaomiao Cai, Simiao Li, Wei Li, Xudong Huang, Hanting Chen, Jie Hu, Yunhe Wang,
- Abstract summary: We introduce human preference alignment into Real-ISR, a technique that has been successfully applied in Large Language Models and Text-to-Image tasks.<n>We propose Direct Semantic Preference Optimization (DSPO) to align instance-level human preferences by incorporating semantic guidance.<n>As a plug-and-play solution, DSPO proves highly effective in both one-step and multi-step SR frameworks.
- Score: 24.460369372304807
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in diffusion models have improved Real-World Image Super-Resolution (Real-ISR), but existing methods lack human feedback integration, risking misalignment with human preference and may leading to artifacts, hallucinations and harmful content generation. To this end, we are the first to introduce human preference alignment into Real-ISR, a technique that has been successfully applied in Large Language Models and Text-to-Image tasks to effectively enhance the alignment of generated outputs with human preferences. Specifically, we introduce Direct Preference Optimization (DPO) into Real-ISR to achieve alignment, where DPO serves as a general alignment technique that directly learns from the human preference dataset. Nevertheless, unlike high-level tasks, the pixel-level reconstruction objectives of Real-ISR are difficult to reconcile with the image-level preferences of DPO, which can lead to the DPO being overly sensitive to local anomalies, leading to reduced generation quality. To resolve this dichotomy, we propose Direct Semantic Preference Optimization (DSPO) to align instance-level human preferences by incorporating semantic guidance, which is through two strategies: (a) semantic instance alignment strategy, implementing instance-level alignment to ensure fine-grained perceptual consistency, and (b) user description feedback strategy, mitigating hallucinations through semantic textual feedback on instance-level images. As a plug-and-play solution, DSPO proves highly effective in both one-step and multi-step SR frameworks.
Related papers
- DiffRIS: Enhancing Referring Remote Sensing Image Segmentation with Pre-trained Text-to-Image Diffusion Models [9.109484087832058]
DiffRIS is a novel framework that harnesses the semantic understanding capabilities of pre-trained text-to-image diffusion models for RRSIS tasks.<n>Our framework introduces two key innovations: a context perception adapter (CP-adapter) and a cross-modal reasoning decoder (PCMRD)
arXiv Detail & Related papers (2025-06-23T02:38:56Z) - SUDO: Enhancing Text-to-Image Diffusion Models with Self-Supervised Direct Preference Optimization [19.087540230261684]
Previous text-to-image diffusion models typically employ supervised fine-tuning to enhance pre-trained base models.<n>We introduce Self-sUpervised Direct preference Optimization (SUDO), a novel paradigm that optimize both fine-grained details at the pixel level and global image quality.<n>As an effective alternative to supervised fine-tuning, SUDO can be seamlessly applied to any text-to-image diffusion model.
arXiv Detail & Related papers (2025-04-20T08:18:27Z) - Leveraging Robust Optimization for LLM Alignment under Distribution Shifts [54.654823811482665]
Large language models (LLMs) increasingly rely on preference alignment methods to steer outputs toward human values.
Recent approaches have turned to synthetic data generated by LLMs as a scalable alternative.
We propose a novel distribution-aware optimization framework that improves preference alignment in the presence of such shifts.
arXiv Detail & Related papers (2025-04-08T09:14:38Z) - A Survey of Direct Preference Optimization [103.59317151002693]
Large Language Models (LLMs) have demonstrated unprecedented generative capabilities.<n>Their alignment with human values remains critical for ensuring helpful and harmless deployments.<n>Direct Preference Optimization (DPO) has recently gained prominence as a streamlined alternative.
arXiv Detail & Related papers (2025-03-12T08:45:15Z) - Distributionally Robust Direct Preference Optimization [15.328510632723505]
A major challenge in aligning large language models with human preferences is the issue of distribution shift.<n>We develop two novel distributionally robust direct preference optimization algorithms, namely, Wasserstein DPO (WDPO) and Kullback-Leibler DPO (KLDPO)<n>Our empirical experiments demonstrate the superior performance of WDPO and KLDPO in substantially improving the alignment when there is a preference distribution shift.
arXiv Detail & Related papers (2025-02-04T02:03:19Z) - The Hitchhiker's Guide to Human Alignment with *PO [43.4130314879284]
We focus on identifying the algorithm that, while being performant, is simultaneously more robust to varying hyper parameters.
Our analysis reveals that the widely adopted DPO method consistently produces lengthy responses of inferior quality.
Motivated by these findings, we propose an embarrassingly simple extension to the DPO algorithm, LN-DPO, resulting in more concise responses without sacrificing quality.
arXiv Detail & Related papers (2024-07-21T17:35:20Z) - Adaptive Preference Scaling for Reinforcement Learning with Human Feedback [103.36048042664768]
Reinforcement learning from human feedback (RLHF) is a prevalent approach to align AI systems with human values.
We propose a novel adaptive preference loss, underpinned by distributionally robust optimization (DRO)
Our method is versatile and can be readily adapted to various preference optimization frameworks.
arXiv Detail & Related papers (2024-06-04T20:33:22Z) - Unified Preference Optimization: Language Model Alignment Beyond the Preference Frontier [0.5120567378386615]
We propose a unified approach to aligning large language models (LLMs)<n>Based on a simple decomposition of preference and auxiliary objectives, we allow for tuning LLMs to optimize user and designer preferences.
arXiv Detail & Related papers (2024-05-28T08:35:48Z) - Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer [52.09480867526656]
We identify the source of misalignment as a form of distributional shift and uncertainty in learning human preferences.<n>To mitigate overoptimization, we first propose a theoretical algorithm that chooses the best policy for an adversarially chosen reward model.<n>Using the equivalence between reward models and the corresponding optimal policy, the algorithm features a simple objective that combines a preference optimization loss and a supervised learning loss.
arXiv Detail & Related papers (2024-05-26T05:38:50Z) - RS-DPO: A Hybrid Rejection Sampling and Direct Preference Optimization Method for Alignment of Large Language Models [7.676477609461592]
Reinforcement learning from human feedback (RLHF) has been extensively employed to align large language models with user intent.
DPO relies on contrastive responses generated from human annotator and alternative LLM, instead of the policy model.
In this paper, we address both challenges by systematically combining sampling rejection (RS) and DPO.
Our proposed method effectively fine-tunes LLMs with limited resource environments, leading to improved alignment with user intent.
arXiv Detail & Related papers (2024-02-15T16:00:58Z) - Towards Efficient Exact Optimization of Language Model Alignment [93.39181634597877]
Direct preference optimization (DPO) was proposed to directly optimize the policy from preference data.
We show that DPO derived based on the optimal solution of problem leads to a compromised mean-seeking approximation of the optimal solution in practice.
We propose efficient exact optimization (EXO) of the alignment objective.
arXiv Detail & Related papers (2024-02-01T18:51:54Z) - PARL: A Unified Framework for Policy Alignment in Reinforcement Learning from Human Feedback [106.63518036538163]
We present a novel unified bilevel optimization-based framework, textsfPARL, formulated to address the recently highlighted critical issue of policy alignment in reinforcement learning.
Our framework addressed these concerns by explicitly parameterizing the distribution of the upper alignment objective (reward design) by the lower optimal variable.
Our empirical results substantiate that the proposed textsfPARL can address the alignment concerns in RL by showing significant improvements.
arXiv Detail & Related papers (2023-08-03T18:03:44Z) - Revisiting Deep Subspace Alignment for Unsupervised Domain Adaptation [42.16718847243166]
Unsupervised domain adaptation (UDA) aims to transfer and adapt knowledge from a labeled source domain to an unlabeled target domain.
Traditionally, subspace-based methods form an important class of solutions to this problem.
This paper revisits the use of subspace alignment for UDA and proposes a novel adaptation algorithm that consistently leads to improved generalization.
arXiv Detail & Related papers (2022-01-05T20:16:38Z) - Domain Adaptive Person Re-Identification via Coupling Optimization [58.567492812339566]
Domain adaptive person Re-Identification (ReID) is challenging owing to the domain gap and shortage of annotations on target scenarios.
This paper proposes a coupling optimization method including the Domain-Invariant Mapping (DIM) method and the Global-Local distance Optimization ( GLO)
GLO is designed to train the ReID model with unsupervised setting on the target domain.
arXiv Detail & Related papers (2020-11-06T14:01:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.