Related papers: Multi-Style Facial Sketch Synthesis through Masked Generative Modeling

Multi-Style Facial Sketch Synthesis through Masked Generative Modeling

URL: http://arxiv.org/abs/2408.12400v1
Date: Thu, 22 Aug 2024 13:45:04 GMT
Title: Multi-Style Facial Sketch Synthesis through Masked Generative Modeling
Authors: Bowen Sun, Guo Lu, Shibao Zheng,
Abstract summary: We propose a lightweight end-to-end synthesis model that efficiently converts images to corresponding multi-stylized sketches. In this study, we overcome the issue of data insufficiency by incorporating semi-supervised learning into the training process. Our method consistently outperforms previous algorithms across multiple benchmarks.
Score: 17.313050611750413
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The facial sketch synthesis (FSS) model, capable of generating sketch portraits from given facial photographs, holds profound implications across multiple domains, encompassing cross-modal face recognition, entertainment, art, media, among others. However, the production of high-quality sketches remains a formidable task, primarily due to the challenges and flaws associated with three key factors: (1) the scarcity of artist-drawn data, (2) the constraints imposed by limited style types, and (3) the deficiencies of processing input information in existing models. To address these difficulties, we propose a lightweight end-to-end synthesis model that efficiently converts images to corresponding multi-stylized sketches, obviating the necessity for any supplementary inputs (\eg, 3D geometry). In this study, we overcome the issue of data insufficiency by incorporating semi-supervised learning into the training process. Additionally, we employ a feature extraction module and style embeddings to proficiently steer the generative transformer during the iterative prediction of masked image tokens, thus achieving a continuous stylized output that retains facial features accurately in sketches. The extensive experiments demonstrate that our method consistently outperforms previous algorithms across multiple benchmarks, exhibiting a discernible disparity.

Related papers

Bridging Cognitive Gap: Hierarchical Description Learning for Artistic Image Aesthetics Assessment [51.40989269202702]
aesthetic quality assessment task is crucial for developing a human-aligned quantitative evaluation system for AIGC.<n>We propose ArtQuant, an aesthetics assessment framework for artistic images which couples isolated aesthetic dimensions through description generation.<n>Our approach achieves epoch state-of-the-art performance on several datasets while requiring only 33% of conventional trainings.
arXiv Detail & Related papers (2025-12-29T12:18:26Z)
ViewMask-1-to-3: Multi-View Consistent Image Generation via Multimodal Diffusion Models [70.28556518166037]
We introduce ViewMask-1-to-3, a pioneering approach to apply discrete diffusion models to multi-view image generation.<n>By unifying language and vision through masked token prediction, our approach enables progressive generation of multiple viewpoints.<n>Our approach ranks first on average across GSO and 3D-FUTURE datasets in terms of PSNR, SSIM, and LPIPS.
arXiv Detail & Related papers (2025-12-16T05:15:07Z)
MSMA: Multi-Scale Feature Fusion For Multi-Attribute 3D Face Reconstruction From Unconstrained Images [0.0]
Reconstructing 3D face from a single unconstrained image remains a challenging problem due to diverse conditions in unconstrained environments.<n>We propose a Multi-Scale Feature Fusion with Multi-Attribute framework for 3D face reconstruction from unconstrained images.
arXiv Detail & Related papers (2025-09-15T10:30:08Z)
SketchYourSeg: Mask-Free Subjective Image Segmentation via Freehand Sketches [116.1810651297801]
SketchYourSeg establishes freehand sketches as a powerful query modality for subjective image segmentation. Our evaluations demonstrate superior performance over existing approaches across diverse benchmarks.
arXiv Detail & Related papers (2025-01-27T13:07:51Z)
ImFace++: A Sophisticated Nonlinear 3D Morphable Face Model with Implicit Neural Representations [25.016000421755162]
This paper presents a novel 3D morphable face model, named ImFace++, to learn a sophisticated and continuous space with implicit neural representations. ImFace++ first constructs two explicitly disentangled deformation fields to model complex shapes associated with identities and expressions. A refinement displacement field within the template space is further incorporated, enabling fine-grained learning of individual-specific facial details.
arXiv Detail & Related papers (2023-12-07T03:53:53Z)
Preface: A Data-driven Volumetric Prior for Few-shot Ultra High-resolution Face Synthesis [0.0]
NeRFs have enabled highly realistic synthesis of human faces including complex appearance and reflectance effects of hair and skin. We propose a novel human face prior that enables the synthesis of ultra high-resolution novel views of subjects that are not part of the prior's training distribution.
arXiv Detail & Related papers (2023-09-28T21:21:44Z)
IT3D: Improved Text-to-3D Generation with Explicit View Synthesis [71.68595192524843]
This study presents a novel strategy that leverages explicitly synthesized multi-view images to address these issues. Our approach involves the utilization of image-to-image pipelines, empowered by LDMs, to generate posed high-quality images. For the incorporated discriminator, the synthesized multi-view images are considered real data, while the renderings of the optimized 3D models function as fake data.
arXiv Detail & Related papers (2023-08-22T14:39:17Z)
Uncertainty-Aware Cross-Modal Transfer Network for Sketch-Based 3D Shape Retrieval [8.765045867163646]
This paper presents an uncertainty-aware cross-modal transfer network (UACTN) that addresses this issue. We first introduce an end-to-end classification-based approach that simultaneously learns sketch features and uncertainty. Then, 3D shape features are mapped into the pre-learned sketch embedding space for feature alignment.
arXiv Detail & Related papers (2023-08-11T05:46:52Z)
SARGAN: Spatial Attention-based Residuals for Facial Expression Manipulation [1.7056768055368383]
We present a novel method named SARGAN that addresses the limitations from three perspectives. We exploited a symmetric encoder-decoder network to attend facial features at multiple scales. Our proposed model performs significantly better than state-of-the-art methods.
arXiv Detail & Related papers (2023-03-30T08:15:18Z)
Progressive Multi-view Human Mesh Recovery with Self-Supervision [68.60019434498703]
Existing solutions typically suffer from poor generalization performance to new settings. We propose a novel simulation-based training pipeline for multi-view human mesh recovery.
arXiv Detail & Related papers (2022-12-10T06:28:29Z)
Facial Geometric Detail Recovery via Implicit Representation [147.07961322377685]
We present a robust texture-guided geometric detail recovery approach using only a single in-the-wild facial image. Our method combines high-quality texture completion with the powerful expressiveness of implicit surfaces. Our method not only recovers accurate facial details but also decomposes normals, albedos, and shading parts in a self-supervised way.
arXiv Detail & Related papers (2022-03-18T01:42:59Z)
IMAGINE: Image Synthesis by Image-Guided Model Inversion [79.4691654458141]
We introduce an inversion based method, denoted as IMAge-Guided model INvErsion (IMAGINE), to generate high-quality and diverse images. We leverage the knowledge of image semantics from a pre-trained classifier to achieve plausible generations. IMAGINE enables the synthesis procedure to simultaneously 1) enforce semantic specificity constraints during the synthesis, 2) produce realistic images without generator training, and 3) give users intuitive control over the generation process.
arXiv Detail & Related papers (2021-04-13T02:00:24Z)
More Photos are All You Need: Semi-Supervised Learning for Fine-Grained Sketch Based Image Retrieval [112.1756171062067]
We introduce a novel semi-supervised framework for cross-modal retrieval. At the centre of our design is a sequential photo-to-sketch generation model. We also introduce a discriminator guided mechanism to guide against unfaithful generation.
arXiv Detail & Related papers (2021-03-25T17:27:08Z)
Deep Self-Supervised Representation Learning for Free-Hand Sketch [51.101565480583304]
We tackle the problem of self-supervised representation learning for free-hand sketches. Key for the success of our self-supervised learning paradigm lies with our sketch-specific designs. We show that the proposed approach outperforms the state-of-the-art unsupervised representation learning methods.
arXiv Detail & Related papers (2020-02-03T16:28:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.