Deep Facial Synthesis: A New Challenge
- URL: http://arxiv.org/abs/2112.15439v1
- Date: Fri, 31 Dec 2021 13:19:21 GMT
- Title: Deep Facial Synthesis: A New Challenge
- Authors: Deng-Ping Fan, Ziling Huang, Peng Zheng, Hong Liu, Xuebin Qin, and Luc
Van Gool
- Abstract summary: We first introduce a high-quality dataset for FSS, named FS2K, which consists of 2,104 image-sketch pairs.
Second, we present the largest-scale FSS study by investigating 139 classical methods.
Third, we present a simple baseline for FSS, named FSGAN.
- Score: 75.99659340231078
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The goal of this paper is to conduct a comprehensive study on the facial
sketch synthesis (FSS) problem. However, due to the high costs in obtaining
hand-drawn sketch datasets, there lacks a complete benchmark for assessing the
development of FSS algorithms over the last decade. As such, we first introduce
a high-quality dataset for FSS, named FS2K, which consists of 2,104
image-sketch pairs spanning three types of sketch styles, image backgrounds,
lighting conditions, skin colors, and facial attributes. FS2K differs from
previous FSS datasets in difficulty, diversity, and scalability, and should
thus facilitate the progress of FSS research. Second, we present the
largest-scale FSS study by investigating 139 classical methods, including 24
handcrafted feature based facial sketch synthesis approaches, 37 general
neural-style transfer methods, 43 deep image-to-image translation methods, and
35 image-to-sketch approaches. Besides, we elaborate comprehensive experiments
for existing 19 cutting-edge models. Third, we present a simple baseline for
FSS, named FSGAN. With only two straightforward components, i.e., facial-aware
masking and style-vector expansion, FSGAN surpasses the performance of all
previous state-of-the-art models on the proposed FS2K dataset, by a large
margin. Finally, we conclude with lessons learned over the past years, and
point out several unsolved challenges. Our open-source code is available at
https://github.com/DengPingFan/FSGAN.
Related papers
- High-Performance Few-Shot Segmentation with Foundation Models: An Empirical Study [64.06777376676513]
We develop a few-shot segmentation (FSS) framework based on foundation models.
To be specific, we propose a simple approach to extract implicit knowledge from foundation models to construct coarse correspondence.
Experiments on two widely used datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-09-10T08:04:11Z) - SfM on-the-fly: Get better 3D from What You Capture [24.141351494527303]
Structure from Motion (SfM) has been a constant research hotspot in the fields of photogrammetry, computer vision, robotics etc.
This work builds upon the original on-the-fly SfM and presents an updated version with three new advancements to get better 3D from what you capture.
arXiv Detail & Related papers (2024-07-04T13:52:37Z) - MegaScenes: Scene-Level View Synthesis at Scale [69.21293001231993]
Scene-level novel view synthesis (NVS) is fundamental to many vision and graphics applications.
We create a large-scale scene-level dataset from Internet photo collections, called MegaScenes, which contains over 100K structure from motion (SfM) reconstructions from around the world.
We analyze failure cases of state-of-the-art NVS methods and significantly improve generation consistency.
arXiv Detail & Related papers (2024-06-17T17:55:55Z) - Enhanced fringe-to-phase framework using deep learning [2.243491254050456]
We introduce SFNet, a symmetric fusion network that transforms two fringe images into an absolute phase.
To enhance output reliability, Our framework predicts refined phases by incorporating information from fringe images of a different frequency than those used as input.
arXiv Detail & Related papers (2024-02-01T19:47:34Z) - CLIP for All Things Zero-Shot Sketch-Based Image Retrieval, Fine-Grained
or Not [109.69076457732632]
We leverage CLIP for zero-shot sketch based image retrieval (ZS-SBIR)
We put forward novel designs on how best to achieve this synergy.
We observe significant performance gains in the region of 26.9% over previous state-of-the-art.
arXiv Detail & Related papers (2023-03-23T17:02:00Z) - Data-Free Sketch-Based Image Retrieval [56.96186184599313]
We propose Data-Free (DF)-SBIR, where pre-trained, single-modality classification models have to be leveraged to learn cross-modal metric-space for retrieval without access to any training data.
We present a methodology for DF-SBIR, which can leverage knowledge from models independently trained to perform classification on photos and sketches.
Our method also achieves mAPs competitive with data-dependent approaches, all the while requiring no training data.
arXiv Detail & Related papers (2023-03-14T10:34:07Z) - Exploring Neural Models for Query-Focused Summarization [74.41256438059256]
We conduct a systematic exploration of neural approaches to query-focused summarization (QFS)
We present two model extensions that achieve state-of-the-art performance on the QMSum dataset by a margin of up to 3.38 ROUGE-1, 3.72 ROUGE-2, and 3.28 ROUGE-L.
arXiv Detail & Related papers (2021-12-14T18:33:29Z) - Robust Facial Expression Recognition with Convolutional Visual
Transformers [23.05378099875569]
We propose Convolutional Visual Transformers to tackle Facial Expression Recognition in the wild by two main steps.
First, we propose an attentional selective fusion (ASF) for leveraging the feature maps generated by two-branch CNNs.
Second, inspired by the success of Transformers in natural language processing, we propose to model relationships between these visual words with global self-attention.
arXiv Detail & Related papers (2021-03-31T07:07:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.