Related papers: RetouchLLM: Training-free Code-based Image Retouching with Vision Language Models

RetouchLLM: Training-free Code-based Image Retouching with Vision Language Models

URL: http://arxiv.org/abs/2510.08054v2
Date: Fri, 10 Oct 2025 08:25:44 GMT
Title: RetouchLLM: Training-free Code-based Image Retouching with Vision Language Models
Authors: Moon Ye-Bin, Roy Miles, Tae-Hyun Oh, Ismail Elezi, Jiankang Deng,
Abstract summary: We propose RetouchLLM, a training-free white-box image retouching system.<n>It performs interpretable, code-based retouching directly on high-resolution images.<n>Our framework progressively enhances the image in a manner similar to how humans perform multi-step retouching.
Score: 76.79706360982162
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Image retouching not only enhances visual quality but also serves as a means of expressing personal preferences and emotions. However, existing learning-based approaches require large-scale paired data and operate as black boxes, making the retouching process opaque and limiting their adaptability to handle diverse, user- or image-specific adjustments. In this work, we propose RetouchLLM, a training-free white-box image retouching system, which requires no training data and performs interpretable, code-based retouching directly on high-resolution images. Our framework progressively enhances the image in a manner similar to how humans perform multi-step retouching, allowing exploration of diverse adjustment paths. It comprises of two main modules: a visual critic that identifies differences between the input and reference images, and a code generator that produces executable codes. Experiments demonstrate that our approach generalizes well across diverse retouching styles, while natural language-based user interaction enables interpretable and controllable adjustments tailored to user intent.

Related papers

Towards Generalized Multi-Image Editing for Unified Multimodal Models [56.620038824933566]
Unified Multimodal Models (UMMs) integrate multimodal understanding and generation.<n>UMMs are limited to maintaining visual consistency and disambiguating visual cues when referencing details across multiple input images.<n>We propose a scalable multi-image editing framework for UMMs that explicitly distinguishes image identities and generalizes to variable input counts.
arXiv Detail & Related papers (2026-01-09T06:42:49Z)
UniModel: A Visual-Only Framework for Unified Multimodal Understanding and Generation [51.31795451147935]
We present a unified generative model that supports visual understanding and visual generation within a single pixel-to-pixel diffusion framework.<n>Our goal is to achieve unification along three axes: the model, the tasks, and the representations.<n> Experiments on text-to-image synthesis and image-to-text understanding demonstrate strong cross-modal alignment.
arXiv Detail & Related papers (2025-11-21T03:02:10Z)
PerTouch: VLM-Driven Agent for Personalized and Semantic Image Retouching [54.3683137773426]
We propose a unified diffusion-based image retouching framework called PerTouch.<n>Our method supports semantic-level image retouching while maintaining global aesthetics.<n>We develop a VLM-driven agent that can handle both strong and weak user instructions.
arXiv Detail & Related papers (2025-11-17T05:39:15Z)
Image Reconstruction as a Tool for Feature Analysis [2.0249250133493195]
We propose a novel approach for interpreting vision features via image reconstruction.<n>We show that encoders pre-trained on image-based tasks retain significantly more image information than those trained on non-image tasks.<n>Our approach can be applied to any vision encoder, shedding light on the inner structure of its feature space.
arXiv Detail & Related papers (2025-06-09T14:32:18Z)
INRetouch: Context Aware Implicit Neural Representation for Photography Retouching [54.17599183365242]
We propose a novel retouch transfer approach that learns from professional edits through before-after image pairs.<n>We develop a context-aware Implicit Neural Representation that learns to apply edits adaptively based on image content and context.<n>Our method extracts implicit transformations from reference edits and adaptively applies them to new images.
arXiv Detail & Related papers (2024-12-05T03:31:48Z)
LoRA of Change: Learning to Generate LoRA for the Editing Instruction from A Single Before-After Image Pair [116.48684498656871]
We propose the LoRA of Change (LoC) framework for image editing with visual instructions, i.e., before-after image pairs.<n>We learn an instruction-specific LoRA to encode the "change" in a before-after image pair, enhancing the interpretability and reusability of our model.<n>Our model produces high-quality images that align with user intent and support a broad spectrum of real-world visual instructions.
arXiv Detail & Related papers (2024-11-28T13:55:06Z)
DiffRetouch: Using Diffusion to Retouch on the Shoulder of Experts [45.730449182899754]
diffusion-based retouching method named DiffRetouch. Four image attributes are made adjustable to provide a user-friendly editing mechanism. Affine bilateral grid and contrastive learning scheme are introduced to handle the problem of texture distortion and control insensitivity respectively.
arXiv Detail & Related papers (2024-07-04T09:09:42Z)
Automatic Controllable Colorization via Imagination [55.489416987587305]
We propose a framework for automatic colorization that allows for iterative editing and modifications. By understanding the content within a grayscale image, we utilize a pre-trained image generation model to generate multiple images that contain the same content. These images serve as references for coloring, mimicking the process of human experts.
arXiv Detail & Related papers (2024-04-08T16:46:07Z)
Learning Diverse Tone Styles for Image Retouching [73.60013618215328]
We propose to learn diverse image retouching with normalizing flow-based architectures. A joint-training pipeline is composed of a style encoder, a conditional RetouchNet, and the image tone style normalizing flow (TSFlow) module. Our proposed method performs favorably against state-of-the-art methods and is effective in generating diverse results.
arXiv Detail & Related papers (2022-07-12T09:49:21Z)
Enhance Images as You Like with Unpaired Learning [8.104571453311442]
We propose a lightweight one-path conditional generative adversarial network (cGAN) to learn a one-to-many relation from low-light to normal-light image space. Our network learns to generate a collection of enhanced images from a given input conditioned on various reference images. Our model achieves competitive visual and quantitative results on par with fully supervised methods on both noisy and clean datasets.
arXiv Detail & Related papers (2021-10-04T03:00:44Z)
Enjoy Your Editing: Controllable GANs for Image Editing via Latent Space Navigation [136.53288628437355]
Controllable semantic image editing enables a user to change entire image attributes with few clicks. Current approaches often suffer from attribute edits that are entangled, global image identity changes, and diminished photo-realism. We propose quantitative evaluation strategies for measuring controllable editing performance, unlike prior work which primarily focuses on qualitative evaluation.
arXiv Detail & Related papers (2021-02-01T21:38:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.