Kontinuous Kontext: Continuous Strength Control for Instruction-based Image Editing
- URL: http://arxiv.org/abs/2510.08532v1
- Date: Thu, 09 Oct 2025 17:51:03 GMT
- Title: Kontinuous Kontext: Continuous Strength Control for Instruction-based Image Editing
- Authors: Rishubh Parihar, Or Patashnik, Daniil Ostashev, R. Venkatesh Babu, Daniel Cohen-Or, Kuan-Chieh Wang,
- Abstract summary: Kontinuous Kontext is an instruction-driven editing model that provides a new dimension of control over edit strength.<n>A lightweight projector network maps the input scalar and the edit instruction to coefficients in the model's modulation space.<n>For training our model, we synthesize a diverse dataset of image-edit-instruction-strength quadruplets using existing generative models.
- Score: 76.44219733285898
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Instruction-based image editing offers a powerful and intuitive way to manipulate images through natural language. Yet, relying solely on text instructions limits fine-grained control over the extent of edits. We introduce Kontinuous Kontext, an instruction-driven editing model that provides a new dimension of control over edit strength, enabling users to adjust edits gradually from no change to a fully realized result in a smooth and continuous manner. Kontinuous Kontext extends a state-of-the-art image editing model to accept an additional input, a scalar edit strength which is then paired with the edit instruction, enabling explicit control over the extent of the edit. To inject this scalar information, we train a lightweight projector network that maps the input scalar and the edit instruction to coefficients in the model's modulation space. For training our model, we synthesize a diverse dataset of image-edit-instruction-strength quadruplets using existing generative models, followed by a filtering stage to ensure quality and consistency. Kontinuous Kontext provides a unified approach for fine-grained control over edit strength for instruction driven editing from subtle to strong across diverse operations such as stylization, attribute, material, background, and shape changes, without requiring attribute-specific training.
Related papers
- Instruction-based Image Editing with Planning, Reasoning, and Generation [52.0364486403062]
Prior work utilizes a chain of large language models, object segmentation models, and editing models for this task.<n>We aim to bridge understanding and generation via a new multi-modality model that provides the intelligent abilities to instruction-based image editing models.<n>Our method has competitive editing abilities on complex real-world images.
arXiv Detail & Related papers (2026-02-26T04:56:02Z) - NumeriKontrol: Adding Numeric Control to Diffusion Transformers for Instruction-based Image Editing [12.728322570816248]
We introduce NumeriKontrol, a framework that allows users to adjust image attributes using continuous attribute values with common units.<n>Thanks to task-separated design, our approach supports zero-separated multi-condition editing.<n>We synthesize precise training data from reliable sources, including high-fidelity DSLR and DSLR cameras.
arXiv Detail & Related papers (2025-11-28T11:43:52Z) - SliderEdit: Continuous Image Editing with Fine-Grained Instruction Control [50.76070785417023]
We introduce SliderEdit, a framework for continuous image editing with fine-grained, interpretable instruction control.<n>Given a multi-part edit instruction, SliderEdit disentangles the individual instructions and exposes each as a globally trained slider.<n>Our results pave the way for interactive, instruction-driven image manipulation with continuous and compositional control.
arXiv Detail & Related papers (2025-11-12T20:21:37Z) - Group Relative Attention Guidance for Image Editing [38.299491082179905]
Group Relative Attention Guidance (GRAG) is a simple yet effective method that modulates the focus of the model on the input image relative to the editing instruction.<n>Our code will be released at https://www.littlemisfit.com/little-misfit/GRAG-Image-Editing.
arXiv Detail & Related papers (2025-10-28T17:22:44Z) - SAEdit: Token-level control for continuous image editing via Sparse AutoEncoder [52.754326452329956]
We introduce a method for disentangled and continuous editing through token-level manipulation of text embeddings.<n>The edits are applied by manipulating the embeddings along carefully chosen directions, which control the strength of the target attribute.<n>Our method operates directly on text embeddings without modifying the diffusion process, making it model agnostic and broadly applicable to various image backbones.
arXiv Detail & Related papers (2025-10-06T17:51:04Z) - Describe, Don't Dictate: Semantic Image Editing with Natural Language Intent [38.61468007698179]
We propose a descriptive-prompt-based editing framework, named DescriptiveEdit.<n>The core idea is to re-frame instruction-based image editing' as reference-image-based text-to-image generation'
arXiv Detail & Related papers (2025-08-28T07:45:08Z) - InstantEdit: Text-Guided Few-Step Image Editing with Piecewise Rectified Flow [19.972879378697215]
We propose a fast text-guided image editing method called InstantEdit based on the RectifiedFlow framework.<n>Our approach leverages the straight sampling trajectories of RectifiedFlow by introducing a specialized inversion strategy called PerRFI.<n>We also propose a novel regeneration method, Inversion Latent Injection, which effectively reuses latent information obtained during inversion to facilitate more coherent and detailed regeneration.
arXiv Detail & Related papers (2025-08-08T05:38:17Z) - AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea [88.79769371584491]
We present AnyEdit, a comprehensive multi-modal instruction editing dataset.<n>We ensure the diversity and quality of the AnyEdit collection through three aspects: initial data diversity, adaptive editing process, and automated selection of editing results.<n>Experiments on three benchmark datasets show that AnyEdit consistently boosts the performance of diffusion-based editing models.
arXiv Detail & Related papers (2024-11-24T07:02:56Z) - ControlEdit: A MultiModal Local Clothing Image Editing Method [3.6604114810930946]
Multimodal clothing image editing refers to the precise adjustment and modification of clothing images using data such as textual descriptions and visual images as control conditions.
We propose a new image editing method ControlEdit, which transfers clothing image editing to multimodal-guided local inpainting of clothing images.
arXiv Detail & Related papers (2024-09-23T05:34:59Z) - Optimisation-Based Multi-Modal Semantic Image Editing [58.496064583110694]
We propose an inference-time editing optimisation to accommodate multiple editing instruction types.
By allowing to adjust the influence of each loss function, we build a flexible editing solution that can be adjusted to user preferences.
We evaluate our method using text, pose and scribble edit conditions, and highlight our ability to achieve complex edits.
arXiv Detail & Related papers (2023-11-28T15:31:11Z) - LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance [0.0]
LEDITS is a combined lightweight approach for real-image editing, incorporating the Edit Friendly DDPM inversion technique with Semantic Guidance.
This approach achieves versatile edits, both subtle and extensive as well as alterations in composition and style, while requiring no optimization nor extensions to the architecture.
arXiv Detail & Related papers (2023-07-02T09:11:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.