MOST-Net: A Memory Oriented Style Transfer Network for Face Sketch
Synthesis
- URL: http://arxiv.org/abs/2202.03596v1
- Date: Tue, 8 Feb 2022 01:51:24 GMT
- Title: MOST-Net: A Memory Oriented Style Transfer Network for Face Sketch
Synthesis
- Authors: Fan Ji, Muyi Sun, Xingqun Qi, Qi Li, Zhenan Sun
- Abstract summary: Face sketch synthesis has been widely used in multi-media entertainment and law enforcement.
Current image-to-image translation-based face sketch synthesis frequently encounters over-fitting problems when it comes to small-scale datasets.
We present an end-to-end Memory Oriented Style Transfer Network (MOST-Net) for face sketch synthesis which can produce high-fidelity sketches with limited data.
- Score: 41.80739104463557
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Face sketch synthesis has been widely used in multi-media entertainment and
law enforcement. Despite the recent developments in deep neural networks,
accurate and realistic face sketch synthesis is still a challenging task due to
the diversity and complexity of human faces. Current image-to-image
translation-based face sketch synthesis frequently encounters over-fitting
problems when it comes to small-scale datasets. To tackle this problem, we
present an end-to-end Memory Oriented Style Transfer Network (MOST-Net) for
face sketch synthesis which can produce high-fidelity sketches with limited
data. Specifically, an external self-supervised dynamic memory module is
introduced to capture the domain alignment knowledge in the long term. In this
way, our proposed model could obtain the domain-transfer ability by
establishing the durable relationship between faces and corresponding sketches
on the feature level. Furthermore, we design a novel Memory Refinement Loss (MR
Loss) for feature alignment in the memory module, which enhances the accuracy
of memory slots in an unsupervised manner. Extensive experiments on the CUFS
and the CUFSF datasets show that our MOST-Net achieves state-of-the-art
performance, especially in terms of the Structural Similarity Index(SSIM).
Related papers
- Modern Hopfield Networks meet Encoded Neural Representations -- Addressing Practical Considerations [5.272882258282611]
This paper introduces Hopfield HEN, a framework that integrates encoded representations into MHNs to improve pattern separability and reduce meta-stable states.
We show that HEN can also be used for retrieval in the context of hetero association of images with natural language queries, thus removing the limitation of requiring access to partial content in the same domain.
arXiv Detail & Related papers (2024-09-24T19:17:15Z) - CD-NGP: A Fast Scalable Continual Representation for Dynamic Scenes [9.217592165862762]
We propose continual dynamic neural graphics primitives (CD-NGP) for view synthesis.
Our approach synergizes features from both temporal and spatial hash encodings to achieve high rendering quality.
We introduce a novel dataset comprising multi-view, exceptionally long video sequences with substantial rigid and non-rigid motion.
arXiv Detail & Related papers (2024-09-08T17:35:48Z) - Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing.
Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery.
We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z) - Federated Multi-View Synthesizing for Metaverse [52.59476179535153]
The metaverse is expected to provide immersive entertainment, education, and business applications.
Virtual reality (VR) transmission over wireless networks is data- and computation-intensive.
We have developed a novel multi-view synthesizing framework that can efficiently provide synthesizing, storage, and communication resources for wireless content delivery in the metaverse.
arXiv Detail & Related papers (2023-12-18T13:51:56Z) - GM-NeRF: Learning Generalizable Model-based Neural Radiance Fields from
Multi-view Images [79.39247661907397]
We introduce an effective framework Generalizable Model-based Neural Radiance Fields to synthesize free-viewpoint images.
Specifically, we propose a geometry-guided attention mechanism to register the appearance code from multi-view 2D images to a geometry proxy.
arXiv Detail & Related papers (2023-03-24T03:32:02Z) - The Multiscale Surface Vision Transformer [10.833580445244094]
We introduce the Multiscale Surface Vision Transformer (MS-SiT) as a backbone architecture for surface deep learning.
Results demonstrate that the MS-SiT outperforms existing surface deep learning methods for neonatal phenotyping prediction tasks.
arXiv Detail & Related papers (2023-03-21T15:00:17Z) - Face Sketch Synthesis via Semantic-Driven Generative Adversarial Network [10.226808267718523]
We propose a novel Semantic-Driven Generative Adrial Network (SDGAN) which embeds global structure-level style injection and local class-level knowledge re-weighting.
Specifically, we conduct facial saliency detection on the input face photos to provide overall facial texture structure.
In addition, we exploit face parsing layouts as the semantic-level spatial prior to enforce globally structural style injection in the generator of SDGAN.
arXiv Detail & Related papers (2021-06-29T07:03:56Z) - PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive
Learning [109.84770951839289]
We present PredRNN, a new recurrent network for learning visual dynamics from historical context.
We show that our approach obtains highly competitive results on three standard datasets.
arXiv Detail & Related papers (2021-03-17T08:28:30Z) - Shape My Face: Registering 3D Face Scans by Surface-to-Surface
Translation [75.59415852802958]
Shape-My-Face (SMF) is a powerful encoder-decoder architecture based on an improved point cloud encoder, a novel visual attention mechanism, graph convolutional decoders with skip connections, and a specialized mouth model.
Our model provides topologically-sound meshes with minimal supervision, offers faster training time, has orders of magnitude fewer trainable parameters, is more robust to noise, and can generalize to previously unseen datasets.
arXiv Detail & Related papers (2020-12-16T20:02:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.