Unified Personalized Understanding, Generating and Editing
- URL: http://arxiv.org/abs/2601.06965v1
- Date: Sun, 11 Jan 2026 15:46:34 GMT
- Title: Unified Personalized Understanding, Generating and Editing
- Authors: Yu Zhong, Tianwei Lin, Ruike Zhu, Yuqian Yuan, Haoyu Zheng, Liang Liang, Wenqiao Zhang, Feifei Shao, Haoyuan Li, Wanggui He, Hao Jiang, Yueting Zhuang,
- Abstract summary: We present textbf OmniPersona, an end-to-end personalization framework for unified LMMs.<n>It integrates personalized understanding, generation, and image editing within a single architecture.<n>Experiments demonstrate that OmniPersona delivers competitive and robust performance across diverse personalization tasks.
- Score: 54.5563878110386
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unified large multimodal models (LMMs) have achieved remarkable progress in general-purpose multimodal understanding and generation. However, they still operate under a ``one-size-fits-all'' paradigm and struggle to model user-specific concepts (e.g., generate a photo of \texttt{<maeve>}) in a consistent and controllable manner. Existing personalization methods typically rely on external retrieval, which is inefficient and poorly integrated into unified multimodal pipelines. Recent personalized unified models introduce learnable soft prompts to encode concept information, yet they either couple understanding and generation or depend on complex multi-stage training, leading to cross-task interference and ultimately to fuzzy or misaligned personalized knowledge. We present \textbf{OmniPersona}, an end-to-end personalization framework for unified LMMs that, for the first time, integrates personalized understanding, generation, and image editing within a single architecture. OmniPersona introduces structurally decoupled concept tokens, allocating dedicated subspaces for different tasks to minimize interference, and incorporates an explicit knowledge replay mechanism that propagates personalized attribute knowledge across tasks, enabling consistent personalized behavior. To systematically evaluate unified personalization, we propose \textbf{\texttt{OmniPBench}}, extending the public UnifyBench concept set with personalized editing tasks and cross-task evaluation protocols integrating understanding, generation, and editing. Experimental results demonstrate that OmniPersona delivers competitive and robust performance across diverse personalization tasks. We hope OmniPersona will serve as a strong baseline and spur further research on controllable, unified personalization.
Related papers
- Synthetic Interaction Data for Scalable Personalization in Large Language Models [67.31884245564086]
We introduce a high-fidelity synthetic data generation framework called PersonaGym.<n>Unlike prior work that treats personalization as static persona-preference pairs, PersonaGym models a dynamic preference process.<n>We release PersonaAtlas, a large-scale, high-quality, and diverse synthetic dataset of high-fidelity multi-turn personalized interaction trajectories.
arXiv Detail & Related papers (2026-02-12T20:41:22Z) - Plug-and-Play Multi-Concept Adaptive Blending for High-Fidelity Text-to-Image Synthesis [0.0]
We introduce plug-and-play multi-concept blending for high-fidelity text-to-image (T2I) generation.<n>Our method leverages guided appearance attention to faithfully reflect the intended appearance of each personalized concept.<n>We also present a mask-guided noise mixing strategy that preserves the integrity of non-personalized regions.
arXiv Detail & Related papers (2025-11-18T12:25:47Z) - MVCustom: Multi-View Customized Diffusion via Geometric Latent Rendering and Completion [24.513096225720854]
We introduce a novel task, multi-view customization, which aims to jointly achieve multi-view pose control and customization.<n>We propose MVCustom, a novel diffusion-based framework explicitly designed to achieve both multi-view consistency and customization fidelity.
arXiv Detail & Related papers (2025-10-15T16:00:26Z) - Personalized Vision via Visual In-Context Learning [62.85784251383279]
We present a visual in-context learning framework for personalized vision.<n>PICO infers the underlying transformation and applies it to new inputs without retraining.<n>We also propose an attention-guided seed scorer that improves reliability via efficient inference scaling.
arXiv Detail & Related papers (2025-09-29T17:58:45Z) - MC-LLaVA: Multi-Concept Personalized Vision-Language Model [51.645660375766575]
This paper proposes the first multi-concept personalization paradigm, MC-LLaVA.<n>MC-LLaVA employs a multi-concept instruction tuning strategy, effectively integrating multiple concepts in a single training step.<n> Comprehensive qualitative and quantitative experiments demonstrate that MC-LLaVA can achieve impressive multi-concept personalized responses.
arXiv Detail & Related papers (2025-03-24T16:32:17Z) - MC-LLaVA: Multi-Concept Personalized Vision-Language Model [51.645660375766575]
This paper proposes the first multi-concept personalization paradigm, MC-LLaVA.<n>MC-LLaVA employs a multi-concept instruction tuning strategy, effectively integrating multiple concepts in a single training step.<n> Comprehensive qualitative and quantitative experiments demonstrate that MC-LLaVA can achieve impressive multi-concept personalized responses.
arXiv Detail & Related papers (2024-11-18T16:33:52Z) - Towards Unified Multi-Modal Personalization: Large Vision-Language Models for Generative Recommendation and Beyond [87.1712108247199]
Our goal is to establish a Unified paradigm for Multi-modal Personalization systems (UniMP)
We develop a generic and personalization generative framework, that can handle a wide range of personalized needs.
Our methodology enhances the capabilities of foundational language models for personalized tasks.
arXiv Detail & Related papers (2024-03-15T20:21:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.