Related papers: CRAFT-LoRA: Content-Style Personalization via Rank-Constrained Adaptation and Training-Free Fusion

CRAFT-LoRA: Content-Style Personalization via Rank-Constrained Adaptation and Training-Free Fusion

URL: http://arxiv.org/abs/2602.18936v3
Date: Fri, 27 Feb 2026 04:20:24 GMT
Title: CRAFT-LoRA: Content-Style Personalization via Rank-Constrained Adaptation and Training-Free Fusion
Authors: Yu Li, Yujun Cai, Chi Zhang,
Abstract summary: Low-Rank Adaptation (LoRA) offers an efficient personalization approach, with potential for precise control through combining LoRA weights on different concepts.<n>Existing combination techniques face persistent challenges: entanglement between content and style representations, insufficient guidance for controlling elements' influence, and unstable weight fusion that often require additional training.<n>We address these limitations through CRAFT-LoRA, with complementary components: (1) rank-constrained backbone fine-tuning that injects low-rank projection residuals to encourage learning decoupled content and style subspaces; (2) a prompt-guided approach featuring an expert encoder with specialized branches that enables semantic extension and precise control
Score: 27.087994191559904
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Personalized image generation requires effectively balancing content fidelity with stylistic consistency when synthesizing images based on text and reference examples. Low-Rank Adaptation (LoRA) offers an efficient personalization approach, with potential for precise control through combining LoRA weights on different concepts. However, existing combination techniques face persistent challenges: entanglement between content and style representations, insufficient guidance for controlling elements' influence, and unstable weight fusion that often require additional training. We address these limitations through CRAFT-LoRA, with complementary components: (1) rank-constrained backbone fine-tuning that injects low-rank projection residuals to encourage learning decoupled content and style subspaces; (2) a prompt-guided approach featuring an expert encoder with specialized branches that enables semantic extension and precise control through selective adapter aggregation; and (3) a training-free, timestep-dependent classifier-free guidance scheme that enhances generation stability by strategically adjusting noise predictions across diffusion steps. Our method significantly improves content-style disentanglement, enables flexible semantic control over LoRA module combinations, and achieves high-fidelity generation without additional retraining overhead.

Related papers

Dynamic Training-Free Fusion of Subject and Style LoRAs [38.73465144699025]
We propose a training-free fusion framework that operates throughout the generation process.<n>Our approach consistently outperforms state-of-the-art LoRA fusion methods both qualitatively and quantitatively.
arXiv Detail & Related papers (2026-02-17T12:42:30Z)
OmniVL-Guard: Towards Unified Vision-Language Forgery Detection and Grounding via Balanced RL [63.388513841293616]
Existing forgery detection methods fail to handle the interleaved text, images, and videos prevalent in real-world misinformation.<n>To bridge this gap, this paper targets to develop a unified framework for omnibus vision-language forgery detection and grounding.<n>We propose textbf OmniVL-Guard, a balanced reinforcement learning framework for omnibus vision-language forgery detection and grounding.
arXiv Detail & Related papers (2026-02-11T09:41:36Z)
UnHype: CLIP-Guided Hypernetworks for Dynamic LoRA Unlearning [3.8373805990749266]
UnHype is a framework that incorporates hypernetworks into single- and multi-concept Low-Rank Adaptation (LoRA) training.<n>During inference, the hypernetwork dynamically generates adaptive LoRA weights based on the CLIP embedding.<n>We evaluate UnHype across several challenging tasks, including object erasure, celebrity erasure, and explicit content removal.
arXiv Detail & Related papers (2026-02-03T11:37:08Z)
An Integrated Fusion Framework for Ensemble Learning Leveraging Gradient Boosting and Fuzzy Rule-Based Models [59.13182819190547]
Fuzzy rule-based models excel in interpretability and have seen widespread application across diverse fields.<n>They face challenges such as complex design specifications and scalability issues with large datasets.<n>This paper proposes an Integrated Fusion Framework that merges the strengths of both paradigms to enhance model performance and interpretability.
arXiv Detail & Related papers (2025-11-11T10:28:23Z)
Steerable Adversarial Scenario Generation through Test-Time Preference Alignment [58.37104890690234]
Adversarial scenario generation is a cost-effective approach for safety assessment of autonomous driving systems.<n>We introduce a new framework named textbfSteerable textbfAdversarial scenario textbfGEnerator (SAGE)<n>SAGE enables fine-grained test-time control over the trade-off between adversariality and realism without any retraining.
arXiv Detail & Related papers (2025-09-24T13:27:35Z)
AutoLoRA: Automatic LoRA Retrieval and Fine-Grained Gated Fusion for Text-to-Image Generation [32.46570968627392]
Low-rank adaptation (LoRA) have demonstrated efficacy in enabling model customization with minimal parameter overhead.<n>We introduce a novel framework that enables semantic-driven LoRA retrieval and dynamic aggregation.<n>Our approach achieves significant improvement in image generation perfermance.
arXiv Detail & Related papers (2025-08-04T06:36:00Z)
Interpretable Few-Shot Image Classification via Prototypical Concept-Guided Mixture of LoRA Experts [79.18608192761512]
Self-Explainable Models (SEMs) rely on Prototypical Concept Learning (PCL) to enable their visual recognition processes more interpretable.<n>We propose a Few-Shot Prototypical Concept Classification framework that mitigates two key challenges under low-data regimes: parametric imbalance and representation misalignment.<n>Our approach consistently outperforms existing SEMs by a notable margin, with 4.2%-8.7% relative gains in 5-way 5-shot classification.
arXiv Detail & Related papers (2025-06-05T06:39:43Z)
MultLFG: Training-free Multi-LoRA composition using Frequency-domain Guidance [44.4839416120775]
MultLFG is a framework for training-free multi-LoRA composition.<n>It uses frequency-domain guidance to achieve adaptive fusion of multiple LoRAs.<n>It substantially enhances compositional fidelity and image quality across various styles and concept sets.
arXiv Detail & Related papers (2025-05-26T21:05:28Z)
Continuous Knowledge-Preserving Decomposition with Adaptive Layer Selection for Few-Shot Class-Incremental Learning [73.59672160329296]
CKPD-FSCIL is a unified framework that unlocks the underutilized capacity of pretrained weights.<n>Our method consistently outperforms state-of-the-art approaches in both adaptability and knowledge retention.
arXiv Detail & Related papers (2025-01-09T07:18:48Z)
ZePo: Zero-Shot Portrait Stylization with Faster Sampling [61.14140480095604]
This paper presents an inversion-free portrait stylization framework based on diffusion models that accomplishes content and style feature fusion in merely four sampling steps. We propose a feature merging strategy to amalgamate redundant features in Consistency Features, thereby reducing the computational load of attention control.
arXiv Detail & Related papers (2024-08-10T08:53:41Z)
RB-Modulation: Training-Free Personalization of Diffusion Models using Stochastic Optimal Control [43.96257216397601]
We propose a new plug-and-play solution for training-free personalization of diffusion models. RB-Modulation is built on a novel optimal controller where a style descriptor encodes the desired attributes. Cross-attention-based feature aggregation scheme allows RB-Modulation to decouple content and style from the reference image.
arXiv Detail & Related papers (2024-05-27T17:51:08Z)
Unleashing Network Potentials for Semantic Scene Completion [50.95486458217653]
This paper proposes a novel SSC framework - Adrial Modality Modulation Network (AMMNet) AMMNet introduces two core modules: a cross-modal modulation enabling the interdependence of gradient flows between modalities, and a customized adversarial training scheme leveraging dynamic gradient competition. Extensive experimental results demonstrate that AMMNet outperforms state-of-the-art SSC methods by a large margin.
arXiv Detail & Related papers (2024-03-12T11:48:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.