Related papers: SketchAssist: A Practical Assistant for Semantic Edits and Precise Local Redrawing

SketchAssist: A Practical Assistant for Semantic Edits and Precise Local Redrawing

URL: http://arxiv.org/abs/2512.14140v1
Date: Tue, 16 Dec 2025 06:50:44 GMT
Title: SketchAssist: A Practical Assistant for Semantic Edits and Precise Local Redrawing
Authors: Han Zou, Yan Zhang, Ruiqi Yu, Cong Xie, Jie Huang, Zhenpeng Zhan,
Abstract summary: We present SketchAssist, an interactive sketch drawing assistant that accelerates creation by unifying instruction-guided global edits with line-guided region redrawing.<n>To enable this assistant at scale, we introduce a controllable data generation pipeline that (i) constructs attribute-addition sequences from attribute-free base sketches, (ii) forms multi-step edit chains via cross-sequence sampling, and (iii) expands stylistic coverage with a style-preserving attribute-removal model.
Score: 13.733328072282049
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Sketch editing is central to digital illustration, yet existing image editing systems struggle to preserve the sparse, style-sensitive structure of line art while supporting both high-level semantic changes and precise local redrawing. We present SketchAssist, an interactive sketch drawing assistant that accelerates creation by unifying instruction-guided global edits with line-guided region redrawing, while keeping unrelated regions and overall composition intact. To enable this assistant at scale, we introduce a controllable data generation pipeline that (i) constructs attribute-addition sequences from attribute-free base sketches, (ii) forms multi-step edit chains via cross-sequence sampling, and (iii) expands stylistic coverage with a style-preserving attribute-removal model applied to diverse sketches. Building on this data, SketchAssist employs a unified sketch editing framework with minimal changes to DiT-based editors. We repurpose the RGB channels to encode the inputs, enabling seamless switching between instruction-guided edits and line-guided redrawing within a single input interface. To further specialize behavior across modes, we integrate a task-guided mixture-of-experts into LoRA layers, routing by text and visual cues to improve semantic controllability, structural fidelity, and style preservation. Extensive experiments show state-of-the-art results on both tasks, with superior instruction adherence and style/structure preservation compared to recent baselines. Together, our dataset and SketchAssist provide a practical, controllable assistant for sketch creation and revision.

Related papers

Multi-Level Conditioning by Pairing Localized Text and Sketch for Fashion Image Generation [14.962452069195544]
We present LOcalized Text and Sketch with multi-level guidance (LOTS)<n>LOTS combines global sketch guidance with multiple localized sketch-text pairs.<n>We develop Sketchy, the first fashion dataset where multiple text-sketch pairs are provided per image.
arXiv Detail & Related papers (2026-02-20T16:07:31Z)
RePlan: Reasoning-guided Region Planning for Complex Instruction-based Image Editing [80.70169829264812]
RePlan is a plan-then-execute framework that couples a vision-language planner with a diffusion editor.<n>The planner decomposes instructions via step-by-step reasoning and explicitly grounds them to target regions.<n>The editor then applies changes using a training-free attention-region injection mechanism, enabling precise, parallel multi-region edits without iterative inpainting.
arXiv Detail & Related papers (2025-12-18T18:34:23Z)
Follow-Your-Shape: Shape-Aware Image Editing via Trajectory-Guided Region Control [52.87568958372421]
Follow-Your-Shape is a training-free and mask-free framework that supports precise and controllable editing of object shapes.<n>We compute a Trajectory Divergence Map (TDM) by comparing token-wise velocity differences between the inversion and denoising paths.<n>Our method achieves superior editability and visual fidelity, particularly in tasks requiring large-scale shape replacement.
arXiv Detail & Related papers (2025-08-11T16:10:00Z)
StrokeFusion: Vector Sketch Generation via Joint Stroke-UDF Encoding and Latent Sequence Diffusion [13.862427684807486]
StrokeFusion is a two-stage framework for vector sketch generation.<n>It contains a dual-modal sketch feature learning network that maps strokes into a high-quality latent space.<n>It exploits a stroke-level latent diffusion model that simultaneously adjusts stroke position, scale, and trajectory during generation.
arXiv Detail & Related papers (2025-03-31T06:03:03Z)
Recovering Partially Corrupted Objects via Sketch-Guided Bidirectional Feature Interaction [16.03488741913531]
Text-guided diffusion models provide high-level semantic guidance through text prompts.<n>They often lack precise pixel-level spatial control in partially corrupted objects.<n>We propose a sketch-guided bidirectional feature interaction framework built upon a pretrained Stable Diffusion model.
arXiv Detail & Related papers (2025-03-10T08:34:31Z)
UIP2P: Unsupervised Instruction-based Image Editing via Edit Reversibility Constraint [87.20985852686785]
We propose an unsupervised instruction-based image editing approach that removes the need for ground-truth edited images during training.<n>Our approach addresses these challenges by introducing a novel editing mechanism called Edit Reversibility Constraint (ERC), which applies forward and reverse edits in one training step.<n>This allows us to bypass the need for ground-truth edited images and unlock training for the first time on datasets comprising either real image-caption pairs or image-caption-instruction triplets.
arXiv Detail & Related papers (2024-12-19T18:59:58Z)
Task-Oriented Diffusion Inversion for High-Fidelity Text-based Editing [60.730661748555214]
We introduce textbfTask-textbfOriented textbfDiffusion textbfInversion (textbfTODInv), a novel framework that inverts and edits real images tailored to specific editing tasks. ToDInv seamlessly integrates inversion and editing through reciprocal optimization, ensuring both high fidelity and precise editability.
arXiv Detail & Related papers (2024-08-23T22:16:34Z)
Towards Counterfactual Image Manipulation via CLIP [106.94502632502194]
Existing methods can achieve realistic editing of different visual attributes such as age and gender of facial images. We investigate this problem in a text-driven manner with Contrastive-Language-Image-Pretraining (CLIP) We design a novel contrastive loss that exploits predefined CLIP-space directions to guide the editing toward desired directions from different perspectives.
arXiv Detail & Related papers (2022-07-06T17:02:25Z)
Towards Disentangling Latent Space for Unsupervised Semantic Face Editing [21.190437168936764]
Supervised attribute editing requires annotated training data which is difficult to obtain and limits the editable attributes to those with labels. In this paper, we present a new technique termed Structure-Texture Independent Architecture with Weight Decomposition and Orthogonal Regularization (STIA-WO) to disentangle the latent space for unsupervised semantic face editing.
arXiv Detail & Related papers (2020-11-05T03:29:24Z)
Sketchformer: Transformer-based Representation for Sketched Structure [12.448155157592895]
Sketchformer is a transformer-based representation for encoding free-hand sketches input in a vector form. We report several variants exploring continuous and tokenized input representations, and contrast their performance. Our learned embedding, driven by a dictionary learning tokenization scheme, yields state of the art performance in classification and image retrieval tasks.
arXiv Detail & Related papers (2020-02-24T17:11:53Z)
Deep Plastic Surgery: Robust and Controllable Image Editing with Human-Drawn Sketches [133.01690754567252]
Sketch-based image editing aims to synthesize and modify photos based on the structural information provided by the human-drawn sketches. Deep Plastic Surgery is a novel, robust and controllable image editing framework that allows users to interactively edit images using hand-drawn sketch inputs.
arXiv Detail & Related papers (2020-01-09T08:57:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.