RetouchLLM: Training-free Code-based Image Retouching with Vision Language Models
- URL: http://arxiv.org/abs/2510.08054v2
- Date: Fri, 10 Oct 2025 08:25:44 GMT
- Title: RetouchLLM: Training-free Code-based Image Retouching with Vision Language Models
- Authors: Moon Ye-Bin, Roy Miles, Tae-Hyun Oh, Ismail Elezi, Jiankang Deng,
- Abstract summary: We propose RetouchLLM, a training-free white-box image retouching system.<n>It performs interpretable, code-based retouching directly on high-resolution images.<n>Our framework progressively enhances the image in a manner similar to how humans perform multi-step retouching.
- Score: 76.79706360982162
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Image retouching not only enhances visual quality but also serves as a means of expressing personal preferences and emotions. However, existing learning-based approaches require large-scale paired data and operate as black boxes, making the retouching process opaque and limiting their adaptability to handle diverse, user- or image-specific adjustments. In this work, we propose RetouchLLM, a training-free white-box image retouching system, which requires no training data and performs interpretable, code-based retouching directly on high-resolution images. Our framework progressively enhances the image in a manner similar to how humans perform multi-step retouching, allowing exploration of diverse adjustment paths. It comprises of two main modules: a visual critic that identifies differences between the input and reference images, and a code generator that produces executable codes. Experiments demonstrate that our approach generalizes well across diverse retouching styles, while natural language-based user interaction enables interpretable and controllable adjustments tailored to user intent.
Related papers
- Towards Generalized Multi-Image Editing for Unified Multimodal Models [56.620038824933566]
Unified Multimodal Models (UMMs) integrate multimodal understanding and generation.<n>UMMs are limited to maintaining visual consistency and disambiguating visual cues when referencing details across multiple input images.<n>We propose a scalable multi-image editing framework for UMMs that explicitly distinguishes image identities and generalizes to variable input counts.
arXiv Detail & Related papers (2026-01-09T06:42:49Z) - UniModel: A Visual-Only Framework for Unified Multimodal Understanding and Generation [51.31795451147935]
We present a unified generative model that supports visual understanding and visual generation within a single pixel-to-pixel diffusion framework.<n>Our goal is to achieve unification along three axes: the model, the tasks, and the representations.<n> Experiments on text-to-image synthesis and image-to-text understanding demonstrate strong cross-modal alignment.
arXiv Detail & Related papers (2025-11-21T03:02:10Z) - PerTouch: VLM-Driven Agent for Personalized and Semantic Image Retouching [54.3683137773426]
We propose a unified diffusion-based image retouching framework called PerTouch.<n>Our method supports semantic-level image retouching while maintaining global aesthetics.<n>We develop a VLM-driven agent that can handle both strong and weak user instructions.
arXiv Detail & Related papers (2025-11-17T05:39:15Z) - Image Reconstruction as a Tool for Feature Analysis [2.0249250133493195]
We propose a novel approach for interpreting vision features via image reconstruction.<n>We show that encoders pre-trained on image-based tasks retain significantly more image information than those trained on non-image tasks.<n>Our approach can be applied to any vision encoder, shedding light on the inner structure of its feature space.
arXiv Detail & Related papers (2025-06-09T14:32:18Z) - INRetouch: Context Aware Implicit Neural Representation for Photography Retouching [54.17599183365242]
We propose a novel retouch transfer approach that learns from professional edits through before-after image pairs.<n>We develop a context-aware Implicit Neural Representation that learns to apply edits adaptively based on image content and context.<n>Our method extracts implicit transformations from reference edits and adaptively applies them to new images.
arXiv Detail & Related papers (2024-12-05T03:31:48Z) - LoRA of Change: Learning to Generate LoRA for the Editing Instruction from A Single Before-After Image Pair [116.48684498656871]
We propose the LoRA of Change (LoC) framework for image editing with visual instructions, i.e., before-after image pairs.<n>We learn an instruction-specific LoRA to encode the "change" in a before-after image pair, enhancing the interpretability and reusability of our model.<n>Our model produces high-quality images that align with user intent and support a broad spectrum of real-world visual instructions.
arXiv Detail & Related papers (2024-11-28T13:55:06Z) - DiffRetouch: Using Diffusion to Retouch on the Shoulder of Experts [45.730449182899754]
diffusion-based retouching method named DiffRetouch.
Four image attributes are made adjustable to provide a user-friendly editing mechanism.
Affine bilateral grid and contrastive learning scheme are introduced to handle the problem of texture distortion and control insensitivity respectively.
arXiv Detail & Related papers (2024-07-04T09:09:42Z) - Automatic Controllable Colorization via Imagination [55.489416987587305]
We propose a framework for automatic colorization that allows for iterative editing and modifications.
By understanding the content within a grayscale image, we utilize a pre-trained image generation model to generate multiple images that contain the same content.
These images serve as references for coloring, mimicking the process of human experts.
arXiv Detail & Related papers (2024-04-08T16:46:07Z) - Learning Diverse Tone Styles for Image Retouching [73.60013618215328]
We propose to learn diverse image retouching with normalizing flow-based architectures.
A joint-training pipeline is composed of a style encoder, a conditional RetouchNet, and the image tone style normalizing flow (TSFlow) module.
Our proposed method performs favorably against state-of-the-art methods and is effective in generating diverse results.
arXiv Detail & Related papers (2022-07-12T09:49:21Z) - Enhance Images as You Like with Unpaired Learning [8.104571453311442]
We propose a lightweight one-path conditional generative adversarial network (cGAN) to learn a one-to-many relation from low-light to normal-light image space.
Our network learns to generate a collection of enhanced images from a given input conditioned on various reference images.
Our model achieves competitive visual and quantitative results on par with fully supervised methods on both noisy and clean datasets.
arXiv Detail & Related papers (2021-10-04T03:00:44Z) - Enjoy Your Editing: Controllable GANs for Image Editing via Latent Space
Navigation [136.53288628437355]
Controllable semantic image editing enables a user to change entire image attributes with few clicks.
Current approaches often suffer from attribute edits that are entangled, global image identity changes, and diminished photo-realism.
We propose quantitative evaluation strategies for measuring controllable editing performance, unlike prior work which primarily focuses on qualitative evaluation.
arXiv Detail & Related papers (2021-02-01T21:38:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.