Stroke of Surprise: Progressive Semantic Illusions in Vector Sketching
- URL: http://arxiv.org/abs/2602.12280v1
- Date: Thu, 12 Feb 2026 18:59:54 GMT
- Title: Stroke of Surprise: Progressive Semantic Illusions in Vector Sketching
- Authors: Huai-Hsun Cheng, Siang-Ling Zhang, Yu-Lun Liu,
- Abstract summary: We introduce Progressive Semantic Illusions, a novel vector sketching task where a single sketch undergoes a dramatic semantic transformation through the sequential addition of strokes.<n>We present Stroke of Surprise, a generative framework that optimize vector strokes to satisfy distinct semantic interpretations.<n>Our method significantly outperforms state-of-the-art baselines in recognizability and illusion strength.
- Score: 5.052864647270501
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Visual illusions traditionally rely on spatial manipulations such as multi-view consistency. In this work, we introduce Progressive Semantic Illusions, a novel vector sketching task where a single sketch undergoes a dramatic semantic transformation through the sequential addition of strokes. We present Stroke of Surprise, a generative framework that optimizes vector strokes to satisfy distinct semantic interpretations at different drawing stages. The core challenge lies in the "dual-constraint": initial prefix strokes must form a coherent object (e.g., a duck) while simultaneously serving as the structural foundation for a second concept (e.g., a sheep) upon adding delta strokes. To address this, we propose a sequence-aware joint optimization framework driven by a dual-branch Score Distillation Sampling (SDS) mechanism. Unlike sequential approaches that freeze the initial state, our method dynamically adjusts prefix strokes to discover a "common structural subspace" valid for both targets. Furthermore, we introduce a novel Overlay Loss that enforces spatial complementarity, ensuring structural integration rather than occlusion. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art baselines in recognizability and illusion strength, successfully expanding visual anagrams from the spatial to the temporal dimension. Project page: https://stroke-of-surprise.github.io/
Related papers
- Attention-space Contrastive Guidance for Efficient Hallucination Mitigation in LVLMs [9.043999205886658]
Hallucinations in large vision-language models often arise when language priors dominate over visual evidence.<n>We propose Contrastive Guidance (ACG), a single-pass mechanism that operates within self-attention layers to construct both vision-language and language-only attention paths.<n>ACG achieves state-of-the-art faithfulness and caption quality while significantly reducing computational cost.
arXiv Detail & Related papers (2026-01-20T08:04:18Z) - RecTok: Reconstruction Distillation along Rectified Flow [85.51292475005151]
We propose RecTok, which overcomes the limitations of high-dimensional visual tokenizers through two key innovations.<n>Our method distills the semantic information in VFMs into the forward flow trajectories in flow matching.<n>Our RecTok achieves superior image reconstruction, generation quality, and discriminative performance.
arXiv Detail & Related papers (2025-12-15T15:14:20Z) - SegSplat: Feed-forward Gaussian Splatting and Open-Set Semantic Segmentation [114.57192386025373]
SegSplat is a novel framework designed to bridge the gap between rapid, feed-forward 3D reconstruction and rich, open-vocabulary semantic understanding.<n>This work represents a significant step towards practical, on-the-fly generation of semantically aware 3D environments.
arXiv Detail & Related papers (2025-11-23T10:26:38Z) - Dense Semantic Matching with VGGT Prior [49.42199006453071]
We propose an approach that retains VGGT's intrinsic strengths by reusing early feature stages, fine-tuning later ones, and adding a semantic head for bidirectional correspondences.<n>Our approach achieves superior geometry awareness, matching reliability, and manifold preservation, outperforming previous baselines.
arXiv Detail & Related papers (2025-09-25T14:56:11Z) - StrokeFusion: Vector Sketch Generation via Joint Stroke-UDF Encoding and Latent Sequence Diffusion [13.862427684807486]
StrokeFusion is a two-stage framework for vector sketch generation.<n>It contains a dual-modal sketch feature learning network that maps strokes into a high-quality latent space.<n>It exploits a stroke-level latent diffusion model that simultaneously adjusts stroke position, scale, and trajectory during generation.
arXiv Detail & Related papers (2025-03-31T06:03:03Z) - "Principal Components" Enable A New Language of Images [79.45806370905775]
We introduce a novel visual tokenization framework that embeds a provable PCA-like structure into the latent token space.<n>Our approach achieves state-of-the-art reconstruction performance and enables better interpretability to align with the human vision system.
arXiv Detail & Related papers (2025-03-11T17:59:41Z) - VQ-SGen: A Vector Quantized Stroke Representation for Creative Sketch Generation [12.486307321835909]
VQ-SGen is a novel algorithm for high-quality creative sketch generation.<n>We introduce a vector-quantized (VQ) stroke representation for fine-grained sketch generation.<n>Our method surpasses existing state-of-the-art techniques on the CreativeSketch dataset.
arXiv Detail & Related papers (2024-11-25T14:51:22Z) - Visual Prompt Tuning in Null Space for Continual Learning [51.96411454304625]
Existing prompt-tuning methods have demonstrated impressive performances in continual learning (CL)
This paper aims to learn each task by tuning the prompts in the direction orthogonal to the subspace spanned by previous tasks' features.
In practice, an effective null-space-based approximation solution has been proposed to implement the prompt gradient projection.
arXiv Detail & Related papers (2024-06-09T05:57:40Z) - Self-Supervised Generative-Contrastive Learning of Multi-Modal Euclidean Input for 3D Shape Latent Representations: A Dynamic Switching Approach [53.376029341079054]
We propose a combined generative and contrastive neural architecture for learning latent representations of 3D shapes.<n>The architecture uses two encoder branches for voxel grids and multi-view images from the same underlying shape.
arXiv Detail & Related papers (2023-01-11T18:14:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.