Training Free Stylized Abstraction
- URL: http://arxiv.org/abs/2505.22663v2
- Date: Tue, 01 Jul 2025 17:38:27 GMT
- Title: Training Free Stylized Abstraction
- Authors: Aimon Rahman, Kartik Narayan, Vishal M. Patel,
- Abstract summary: Stylized abstraction synthesizes visually exaggerated yet semantically faithful representations of subjects, balancing recognizability with perceptual distortion.<n>We propose a training-free framework that generates stylized abstractions from a single image using inference-time scaling in vision-language models (VLLMs)<n>Our method adapts structural restoration dynamically through style-aware temporal scheduling, enabling high-fidelity reconstructions that honor both subject and style.
- Score: 27.307331773270676
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stylized abstraction synthesizes visually exaggerated yet semantically faithful representations of subjects, balancing recognizability with perceptual distortion. Unlike image-to-image translation, which prioritizes structural fidelity, stylized abstraction demands selective retention of identity cues while embracing stylistic divergence, especially challenging for out-of-distribution individuals. We propose a training-free framework that generates stylized abstractions from a single image using inference-time scaling in vision-language models (VLLMs) to extract identity-relevant features, and a novel cross-domain rectified flow inversion strategy that reconstructs structure based on style-dependent priors. Our method adapts structural restoration dynamically through style-aware temporal scheduling, enabling high-fidelity reconstructions that honor both subject and style. It supports multi-round abstraction-aware generation without fine-tuning. To evaluate this task, we introduce StyleBench, a GPT-based human-aligned metric suited for abstract styles where pixel-level similarity fails. Experiments across diverse abstraction (e.g., LEGO, knitted dolls, South Park) show strong generalization to unseen identities and styles in a fully open-source setup.
Related papers
- ShapeShift: Towards Text-to-Shape Arrangement Synthesis with Content-Aware Geometric Constraints [13.2441524021269]
ShapeShift is a text-guided image-to-image translation task that requires rearranging the input set of rigid shapes into non-overlapping configurations.<n>We introduce a content-aware collision resolution mechanism that applies minimal semantically coherent adjustments when overlaps occur.<n>Our approach yields interpretable compositions where spatial relationships clearly embody the textual prompt.
arXiv Detail & Related papers (2025-03-18T20:48:58Z) - ZePo: Zero-Shot Portrait Stylization with Faster Sampling [61.14140480095604]
This paper presents an inversion-free portrait stylization framework based on diffusion models that accomplishes content and style feature fusion in merely four sampling steps.
We propose a feature merging strategy to amalgamate redundant features in Consistency Features, thereby reducing the computational load of attention control.
arXiv Detail & Related papers (2024-08-10T08:53:41Z) - How to Handle Sketch-Abstraction in Sketch-Based Image Retrieval? [120.49126407479717]
We propose a sketch-based image retrieval framework capable of handling sketch abstraction at varied levels.
For granularity-level abstraction understanding, we dictate that the retrieval model should not treat all abstraction-levels equally.
Our Acc.@q loss uniquely allows a sketch to narrow/broaden its focus in terms of how stringent the evaluation should be.
arXiv Detail & Related papers (2024-03-11T23:08:29Z) - Pick-and-Draw: Training-free Semantic Guidance for Text-to-Image
Personalization [56.12990759116612]
Pick-and-Draw is a training-free semantic guidance approach to boost identity consistency and generative diversity for personalization methods.
The proposed approach can be applied to any personalized diffusion models and requires as few as a single reference image.
arXiv Detail & Related papers (2024-01-30T05:56:12Z) - A Unified Arbitrary Style Transfer Framework via Adaptive Contrastive
Learning [84.8813842101747]
Unified Contrastive Arbitrary Style Transfer (UCAST) is a novel style representation learning and transfer framework.
We present an adaptive contrastive learning scheme for style transfer by introducing an input-dependent temperature.
Our framework consists of three key components, i.e., a parallel contrastive learning scheme for style representation and style transfer, a domain enhancement module for effective learning of style distribution, and a generative network for style transfer.
arXiv Detail & Related papers (2023-03-09T04:35:00Z) - SimAN: Exploring Self-Supervised Representation Learning of Scene Text
via Similarity-Aware Normalization [66.35116147275568]
Self-supervised representation learning has drawn considerable attention from the scene text recognition community.
We tackle the issue by formulating the representation learning scheme in a generative manner.
We propose a Similarity-Aware Normalization (SimAN) module to identify the different patterns and align the corresponding styles from the guiding patch.
arXiv Detail & Related papers (2022-03-20T08:43:10Z) - StyleMeUp: Towards Style-Agnostic Sketch-Based Image Retrieval [119.03470556503942]
Crossmodal matching problem is typically solved by learning a joint embedding space where semantic content shared between photo and sketch modalities are preserved.
An effective model needs to explicitly account for this style diversity, crucially, to unseen user styles.
Our model can not only disentangle the cross-modal shared semantic content, but can adapt the disentanglement to any unseen user style as well, making the model truly agnostic.
arXiv Detail & Related papers (2021-03-29T15:44:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.