Related papers: PerTouch: VLM-Driven Agent for Personalized and Semantic Image Retouching

PerTouch: VLM-Driven Agent for Personalized and Semantic Image Retouching

URL: http://arxiv.org/abs/2511.12998v1
Date: Mon, 17 Nov 2025 05:39:15 GMT
Title: PerTouch: VLM-Driven Agent for Personalized and Semantic Image Retouching
Authors: Zewei Chang, Zheng-Peng Duan, Jianxing Zhang, Chun-Le Guo, Siyu Liu, Hyungju Chun, Hyunhee Park, Zikun Liu, Chongyi Li,
Abstract summary: We propose a unified diffusion-based image retouching framework called PerTouch.<n>Our method supports semantic-level image retouching while maintaining global aesthetics.<n>We develop a VLM-driven agent that can handle both strong and weak user instructions.
Score: 54.3683137773426
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Image retouching aims to enhance visual quality while aligning with users' personalized aesthetic preferences. To address the challenge of balancing controllability and subjectivity, we propose a unified diffusion-based image retouching framework called PerTouch. Our method supports semantic-level image retouching while maintaining global aesthetics. Using parameter maps containing attribute values in specific semantic regions as input, PerTouch constructs an explicit parameter-to-image mapping for fine-grained image retouching. To improve semantic boundary perception, we introduce semantic replacement and parameter perturbation mechanisms in the training process. To connect natural language instructions with visual control, we develop a VLM-driven agent that can handle both strong and weak user instructions. Equipped with mechanisms of feedback-driven rethinking and scene-aware memory, PerTouch better aligns with user intent and captures long-term preferences. Extensive experiments demonstrate each component's effectiveness and the superior performance of PerTouch in personalized image retouching. Code is available at: https://github.com/Auroral703/PerTouch.

Related papers

BeautyGRPO: Aesthetic Alignment for Face Retouching via Dynamic Path Guidance and Fine-Grained Preference Modeling [29.77085426345252]
Face retouching requires removing subtle imperfections while preserving unique facial identity features, in order to enhance overall aesthetic appeal.<n>Existing methods suffer from a fundamental trade-off. Supervised learning on labeled data is constrained to pixel-level label mimicry, failing to capture complex subjective human aesthetic preferences.<n>We propose BeautyGRPO, a reinforcement learning framework that aligns face retouching with human aesthetic preferences.
arXiv Detail & Related papers (2026-03-01T15:59:31Z)
ProxyImg: Towards Highly-Controllable Image Representation via Hierarchical Disentangled Proxy Embedding [44.20713526887855]
We propose a hierarchical proxy-based parametric image representation that disentangles semantic, geometric, and textural attributes into independent parameter spaces.<n>Our method achieves state-of-the-art rendering fidelity with significantly fewer parameters, while enabling intuitive, interactive, and physically plausible manipulation.
arXiv Detail & Related papers (2026-02-02T09:53:45Z)
RetouchLLM: Training-free Code-based Image Retouching with Vision Language Models [76.79706360982162]
We propose RetouchLLM, a training-free white-box image retouching system.<n>It performs interpretable, code-based retouching directly on high-resolution images.<n>Our framework progressively enhances the image in a manner similar to how humans perform multi-step retouching.
arXiv Detail & Related papers (2025-10-09T10:40:49Z)
ControlThinker: Unveiling Latent Semantics for Controllable Image Generation through Visual Reasoning [76.2503352325492]
ControlThinker is a novel framework that employs a "comprehend-then-generate" paradigm.<n>Latent semantics from control images are mined to enrich text prompts.<n>This enriched semantic understanding then seamlessly aids in image generation without the need for additional complex modifications.
arXiv Detail & Related papers (2025-06-04T05:56:19Z)
DRC: Enhancing Personalized Image Generation via Disentangled Representation Composition [69.10628479553709]
We introduce DRC, a novel personalized image generation framework that enhances Large Multimodal Models (LMMs)<n> DRC explicitly extracts user style preferences and semantic intentions from history images and the reference image, respectively.<n>It involves two critical learning stages: 1) Disentanglement learning, which employs a dual-tower disentangler to explicitly separate style and semantic features, optimized via a reconstruction-driven paradigm with difficulty-aware importance sampling; and 2) Personalized modeling, which applies semantic-preserving augmentations to effectively adapt the disentangled representations for robust personalized generation.
arXiv Detail & Related papers (2025-04-24T08:10:10Z)
SPF-Portrait: Towards Pure Text-to-Portrait Customization with Semantic Pollution-Free Fine-Tuning [33.709835660394305]
SPF-Portrait is a pioneering work to purely understand customized target semantics and minimize disruption to the original model.<n>In our SPF-Portrait, we design a dual-path contrastive learning pipeline, which introduces the original model as a behavioral alignment reference.<n>It adaptively balances the behavioral alignment across different regions and the responsiveness of the target semantics.
arXiv Detail & Related papers (2025-04-01T03:37:30Z)
DiffRetouch: Using Diffusion to Retouch on the Shoulder of Experts [45.730449182899754]
diffusion-based retouching method named DiffRetouch. Four image attributes are made adjustable to provide a user-friendly editing mechanism. Affine bilateral grid and contrastive learning scheme are introduced to handle the problem of texture distortion and control insensitivity respectively.
arXiv Detail & Related papers (2024-07-04T09:09:42Z)
Enjoy Your Editing: Controllable GANs for Image Editing via Latent Space Navigation [136.53288628437355]
Controllable semantic image editing enables a user to change entire image attributes with few clicks. Current approaches often suffer from attribute edits that are entangled, global image identity changes, and diminished photo-realism. We propose quantitative evaluation strategies for measuring controllable editing performance, unlike prior work which primarily focuses on qualitative evaluation.
arXiv Detail & Related papers (2021-02-01T21:38:36Z)
Controllable Image Synthesis via SegVAE [89.04391680233493]
A semantic map is commonly used intermediate representation for conditional image generation. In this work, we specifically target at generating semantic maps given a label-set consisting of desired categories. The proposed framework, SegVAE, synthesizes semantic maps in an iterative manner using conditional variational autoencoder.
arXiv Detail & Related papers (2020-07-16T15:18:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.