Negative-Guided Subject Fidelity Optimization for Zero-Shot Subject-Driven Generation
- URL: http://arxiv.org/abs/2506.03621v2
- Date: Tue, 30 Sep 2025 08:58:48 GMT
- Title: Negative-Guided Subject Fidelity Optimization for Zero-Shot Subject-Driven Generation
- Authors: Chaehun Shin, Jooyoung Choi, Johan Barthelemy, Jungbeom Lee, Sungroh Yoon,
- Abstract summary: We present Subject Fidelity Optimization (SFO), a novel comparative learning framework for zero-shot subject-driven generation.<n>SFO guides the model to favor positives over negatives through pairwise comparison.<n>For negative targets, we propose Condition-Degradation Negative Sampling (CDNS), which automatically produces synthetic negatives tailored for subject-driven generation.
- Score: 52.18071720309418
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present Subject Fidelity Optimization (SFO), a novel comparative learning framework for zero-shot subject-driven generation that enhances subject fidelity. Existing supervised fine-tuning methods, which rely only on positive targets and use the diffusion loss as in the pre-training stage, often fail to capture fine-grained subject details. To address this, SFO introduces additional synthetic negative targets and explicitly guides the model to favor positives over negatives through pairwise comparison. For negative targets, we propose Condition-Degradation Negative Sampling (CDNS), which automatically produces synthetic negatives tailored for subject-driven generation by introducing controlled degradations that emphasize subject fidelity and text alignment without expensive human annotations. Moreover, we reweight the diffusion timesteps to focus fine-tuning on intermediate steps where subject details emerge. Extensive experiments demonstrate that SFO with CDNS significantly outperforms recent strong baselines in terms of both subject fidelity and text alignment on a subject-driven generation benchmark. Project page: https://subjectfidelityoptimization.github.io/
Related papers
- Improving LLM-based Recommendation with Self-Hard Negatives from Intermediate Layers [80.55429742713623]
ILRec is a novel preference fine-tuning framework for LLM-based recommender systems.<n>We introduce a lightweight collaborative filtering model to assign token-level rewards for negative signals.<n>Experiments on three datasets demonstrate ILRec's effectiveness in enhancing the performance of LLM-based recommender systems.
arXiv Detail & Related papers (2026-02-19T14:37:43Z) - Finetuning-Free Personalization of Text to Image Generation via Hypernetworks [15.129799519953139]
We introduce fine-tuning-free personalization via Hypernetworks that predict LoRA-adapted weights directly from subject images.<n>Our approach achieves strong personalization performance and highlights the promise of hypernetworks as a scalable and effective direction for open-category personalization.
arXiv Detail & Related papers (2025-11-05T03:31:33Z) - Intuitionistic Fuzzy Sets for Large Language Model Data Annotation: A Novel Approach to Side-by-Side Preference Labeling [0.0]
This paper introduces a novel framework based on intuitionistic fuzzy sets (IFS) for modeling and aggregating human preferences in large language models (LLMs)<n>Our approach captures not only the degree of preference but also the uncertainty and hesitation inherent in human judgment through membership, non-membership, and hesitation degrees.<n> Experimental validation on multiple datasets demonstrates that our IFS-based approach significantly improves annotation consistency, reduces annotator fatigue, and produces higher-quality preference data.
arXiv Detail & Related papers (2025-05-30T04:20:00Z) - Normalized Attention Guidance: Universal Negative Guidance for Diffusion Models [57.20761595019967]
We present Normalized Attention Guidance (NAG), an efficient, training-free mechanism that applies extrapolation in attention space with L1-based normalization and refinement.<n>NAG restores effective negative guidance where CFG collapses while maintaining fidelity.<n>NAG generalizes across architectures (UNet, DiT), sampling regimes (few-step, multi-step), and modalities (image, video)
arXiv Detail & Related papers (2025-05-27T13:30:46Z) - StarFT: Robust Fine-tuning of Zero-shot Models via Spuriosity Alignment [70.87096576708898]
We propose StarFT, a framework for fine-tuning zero-shot models to enhance robustness by preventing them from learning spuriosity.<n>StarFT boosts both worst-group and average accuracy by 14.30% and 3.02%, respectively, in the Waterbirds group shift scenario.
arXiv Detail & Related papers (2025-05-19T15:15:35Z) - Self-NPO: Negative Preference Optimization of Diffusion Models by Simply Learning from Itself without Explicit Preference Annotations [60.143658714894336]
Diffusion models have demonstrated remarkable success in various visual generation tasks, including image, video, and 3D content generation.<n> Preference optimization (PO) is a prominent and growing area of research that aims to align these models with human preferences.<n>We introduce Self-NPO, a Negative Preference Optimization approach that learns exclusively from the model itself.
arXiv Detail & Related papers (2025-05-17T01:03:46Z) - Negative-Prompt-driven Alignment for Generative Language Model [34.191590966148816]
We propose NEgative-prompt-driven AlignmenT to guide language models away from undesirable behaviors.
NEAT explicitly penalizes the model for producing harmful outputs, guiding it not only toward desirable behaviors but also steering it away from generating undesirable, biased responses.
Extensive experiments validate NEAT's effectiveness in significantly enhancing language models' alignment with human values and preferences.
arXiv Detail & Related papers (2024-10-16T03:30:09Z) - Extensive Self-Contrast Enables Feedback-Free Language Model Alignment [29.268993281961695]
Self-Contrast is a feedback-free large language model alignment method via exploiting extensive self-generated negatives.
With only supervised fine-tuning (SFT) targets, Self-Contrast harnesses a pre-trained embedding model to filter multiple negatives according to text similarity.
Our experiments with direct preference optimization (DPO) on three datasets show that, Self-Contrast could consistently outperform SFT and standard DPO training by large margins.
arXiv Detail & Related papers (2024-03-31T08:30:15Z) - Negating Negatives: Alignment with Human Negative Samples via Distributional Dispreference Optimization [37.8788435790632]
Large language models (LLMs) have revolutionized the role of AI, yet pose potential social risks.
Existing methods rely on high-quality positive-negative training pairs, suffering from noisy positive responses that are barely distinguishable from negative ones.
We propose Distributional Dispreference Optimization (D$2$O), which maximizes the discrepancy between dispreferred responses and the generated non-negative ones.
arXiv Detail & Related papers (2024-03-06T03:02:38Z) - Guided Flows for Generative Modeling and Decision Making [55.42634941614435]
We show that Guided Flows significantly improves the sample quality in conditional image generation and zero-shot text synthesis-to-speech.
Notably, we are first to apply flow models for plan generation in the offline reinforcement learning setting ax speedup in compared to diffusion models.
arXiv Detail & Related papers (2023-11-22T15:07:59Z) - AMRFact: Enhancing Summarization Factuality Evaluation with AMR-Driven Negative Samples Generation [57.8363998797433]
We propose AMRFact, a framework that generates perturbed summaries using Abstract Meaning Representations (AMRs)
Our approach parses factually consistent summaries into AMR graphs and injects controlled factual inconsistencies to create negative examples, allowing for coherent factually inconsistent summaries to be generated with high error-type coverage.
arXiv Detail & Related papers (2023-11-16T02:56:29Z) - Conditional Denoising Diffusion for Sequential Recommendation [62.127862728308045]
Two prominent generative models, Generative Adversarial Networks (GANs) and Variational AutoEncoders (VAEs)
GANs suffer from unstable optimization, while VAEs are prone to posterior collapse and over-smoothed generations.
We present a conditional denoising diffusion model, which includes a sequence encoder, a cross-attentive denoising decoder, and a step-wise diffuser.
arXiv Detail & Related papers (2023-04-22T15:32:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.