Scenimefy: Learning to Craft Anime Scene via Semi-Supervised
Image-to-Image Translation
- URL: http://arxiv.org/abs/2308.12968v1
- Date: Thu, 24 Aug 2023 17:59:50 GMT
- Title: Scenimefy: Learning to Craft Anime Scene via Semi-Supervised
Image-to-Image Translation
- Authors: Yuxin Jiang, Liming Jiang, Shuai Yang, Chen Change Loy
- Abstract summary: We propose Scenimefy, a novel semi-supervised image-to-image translation framework.
Our approach guides the learning with structure-consistent pseudo paired data.
A patch-wise contrastive style loss is introduced to improve stylization and fine details.
- Score: 75.91455714614966
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic high-quality rendering of anime scenes from complex real-world
images is of significant practical value. The challenges of this task lie in
the complexity of the scenes, the unique features of anime style, and the lack
of high-quality datasets to bridge the domain gap. Despite promising attempts,
previous efforts are still incompetent in achieving satisfactory results with
consistent semantic preservation, evident stylization, and fine details. In
this study, we propose Scenimefy, a novel semi-supervised image-to-image
translation framework that addresses these challenges. Our approach guides the
learning with structure-consistent pseudo paired data, simplifying the pure
unsupervised setting. The pseudo data are derived uniquely from a
semantic-constrained StyleGAN leveraging rich model priors like CLIP. We
further apply segmentation-guided data selection to obtain high-quality pseudo
supervision. A patch-wise contrastive style loss is introduced to improve
stylization and fine details. Besides, we contribute a high-resolution anime
scene dataset to facilitate future research. Our extensive experiments
demonstrate the superiority of our method over state-of-the-art baselines in
terms of both perceptual quality and quantitative performance.
Related papers
- Enhanced Unsupervised Image-to-Image Translation Using Contrastive Learning and Histogram of Oriented Gradients [0.0]
This paper proposes an enhanced unsupervised image-to-image translation method based on the Contrastive Unpaired Translation (CUT) model.
This novel approach ensures the preservation of the semantic structure of images, even without semantic labels.
The method was tested on translating synthetic game environments from GTA5 dataset to realistic urban scenes in cityscapes dataset.
arXiv Detail & Related papers (2024-09-24T12:44:27Z) - ENTED: Enhanced Neural Texture Extraction and Distribution for
Reference-based Blind Face Restoration [51.205673783866146]
We present ENTED, a new framework for blind face restoration that aims to restore high-quality and realistic portrait images.
We utilize a texture extraction and distribution framework to transfer high-quality texture features between the degraded input and reference image.
The StyleGAN-like architecture in our framework requires high-quality latent codes to generate realistic images.
arXiv Detail & Related papers (2024-01-13T04:54:59Z) - Bilevel Fast Scene Adaptation for Low-Light Image Enhancement [50.639332885989255]
Enhancing images in low-light scenes is a challenging but widely concerned task in the computer vision.
Main obstacle lies in the modeling conundrum from distribution discrepancy across different scenes.
We introduce the bilevel paradigm to model the above latent correspondence.
A bilevel learning framework is constructed to endow the scene-irrelevant generality of the encoder towards diverse scenes.
arXiv Detail & Related papers (2023-06-02T08:16:21Z) - Semi-Supervised Image Captioning by Adversarially Propagating Labeled
Data [95.0476489266988]
We present a novel data-efficient semi-supervised framework to improve the generalization of image captioning models.
Our proposed method trains a captioner to learn from a paired data and to progressively associate unpaired data.
Our extensive and comprehensive empirical results both on (1) image-based and (2) dense region-based captioning datasets followed by comprehensive analysis on the scarcely-paired dataset.
arXiv Detail & Related papers (2023-01-26T15:25:43Z) - Unsupervised Image-to-Image Translation with Generative Prior [103.54337984566877]
Unsupervised image-to-image translation aims to learn the translation between two visual domains without paired data.
We present a novel framework, Generative Prior-guided UN Image-to-image Translation (GP-UNIT), to improve the overall quality and applicability of the translation algorithm.
arXiv Detail & Related papers (2022-04-07T17:59:23Z) - Harnessing the Conditioning Sensorium for Improved Image Translation [2.9631016562930546]
Multi-modal domain translation typically refers to a novel image that inherits certain localized attributes from a 'content' image.
We propose a new approach to learn disentangled 'content' and'style' representations from scratch.
We define 'content' based on conditioning information extracted by off-the-shelf pre-trained models.
We then train our style extractor and image decoder with an easy to optimize set of reconstruction objectives.
arXiv Detail & Related papers (2021-10-13T02:07:43Z) - Controllable Person Image Synthesis with Spatially-Adaptive Warped
Normalization [72.65828901909708]
Controllable person image generation aims to produce realistic human images with desirable attributes.
We introduce a novel Spatially-Adaptive Warped Normalization (SAWN), which integrates a learned flow-field to warp modulation parameters.
We propose a novel self-training part replacement strategy to refine the pretrained model for the texture-transfer task.
arXiv Detail & Related papers (2021-05-31T07:07:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.