Semi-Supervised Wide-Angle Portraits Correction by Multi-Scale
Transformer
- URL: http://arxiv.org/abs/2109.08024v1
- Date: Tue, 14 Sep 2021 09:40:25 GMT
- Title: Semi-Supervised Wide-Angle Portraits Correction by Multi-Scale
Transformer
- Authors: Fushun Zhu, Shan Zhao, Peng Wang, Hao Wang, Hua Yan, Shuaicheng Liu
- Abstract summary: We propose a semi-supervised network for wide-angle portraits correction.
Our network, named as Multi-Scale Swin-Unet (MS-Unet), is built upon the multi-scale swin transformer block (MSTB)
- Score: 17.455782652441187
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a semi-supervised network for wide-angle portraits correction.
Wide-angle images often suffer from skew and distortion affected by perspective
distortion, especially noticeable at the face regions. Previous deep learning
based approaches require the ground-truth correction flow maps for the training
guidance. However, such labels are expensive, which can only be obtained
manually. In this work, we propose a semi-supervised scheme, which can consume
unlabeled data in addition to the labeled data for improvements. Specifically,
our semi-supervised scheme takes the advantages of the consistency mechanism,
with several novel components such as direction and range consistency (DRC) and
regression consistency (RC). Furthermore, our network, named as Multi-Scale
Swin-Unet (MS-Unet), is built upon the multi-scale swin transformer block
(MSTB), which can learn both local-scale and long-range semantic information
effectively. In addition, we introduce a high-quality unlabeled dataset with
rich scenarios for the training. Extensive experiments demonstrate that the
proposed method is superior over the state-of-the-art methods and other
representative baselines.
Related papers
- Rethinking Transformers Pre-training for Multi-Spectral Satellite
Imagery [78.43828998065071]
Recent advances in unsupervised learning have demonstrated the ability of large vision models to achieve promising results on downstream tasks.
Such pre-training techniques have also been explored recently in the remote sensing domain due to the availability of large amount of unlabelled data.
In this paper, we re-visit transformers pre-training and leverage multi-scale information that is effectively utilized with multiple modalities.
arXiv Detail & Related papers (2024-03-08T16:18:04Z) - 360 Layout Estimation via Orthogonal Planes Disentanglement and Multi-view Geometric Consistency Perception [56.84921040837699]
Existing panoramic layout estimation solutions tend to recover room boundaries from a vertically compressed sequence, yielding imprecise results.
We propose an orthogonal plane disentanglement network (termed DOPNet) to distinguish ambiguous semantics.
We also present an unsupervised adaptation technique tailored for horizon-depth and ratio representations.
Our solution outperforms other SoTA models on both monocular layout estimation and multi-view layout estimation tasks.
arXiv Detail & Related papers (2023-12-26T12:16:03Z) - Conditioning Generative Latent Optimization for Sparse-View CT Image Reconstruction [0.5497663232622965]
We propose an unsupervised conditional approach to the Generative Latent Optimization framework (cGLO)
The approach is tested on full-dose sparse-view CT using multiple training dataset sizes and varying numbers of viewing angles.
arXiv Detail & Related papers (2023-07-31T13:47:33Z) - Evaluating the Label Efficiency of Contrastive Self-Supervised Learning
for Multi-Resolution Satellite Imagery [0.0]
Self-supervised learning has been applied in the remote sensing domain to exploit readily-available unlabeled data.
In this paper, we study self-supervised visual representation learning through the lens of label efficiency.
arXiv Detail & Related papers (2022-10-13T06:54:13Z) - Scale Attention for Learning Deep Face Representation: A Study Against
Visual Scale Variation [69.45176408639483]
We reform the conv layer by resorting to the scale-space theory.
We build a novel style named SCale AttentioN Conv Neural Network (textbfSCAN-CNN)
As a single-shot scheme, the inference is more efficient than multi-shot fusion.
arXiv Detail & Related papers (2022-09-19T06:35:04Z) - Dispensed Transformer Network for Unsupervised Domain Adaptation [21.256375606219073]
A novel unsupervised domain adaptation (UDA) method named dispensed Transformer network (DTNet) is introduced in this paper.
Our proposed network achieves the best performance in comparison with several state-of-the-art techniques.
arXiv Detail & Related papers (2021-10-28T08:27:44Z) - Semi-weakly Supervised Contrastive Representation Learning for Retinal
Fundus Images [0.2538209532048867]
We propose a semi-weakly supervised contrastive learning framework for representation learning using semi-weakly annotated images.
We empirically validate the transfer learning performance of SWCL on seven public retinal fundus datasets.
arXiv Detail & Related papers (2021-08-04T15:50:09Z) - LocalTrans: A Multiscale Local Transformer Network for Cross-Resolution
Homography Estimation [52.63874513999119]
Cross-resolution image alignment is a key problem in multiscale giga photography.
Existing deep homography methods neglecting the explicit formulation of correspondences between them, which leads to degraded accuracy in cross-resolution challenges.
We propose a local transformer network embedded within a multiscale structure to explicitly learn correspondences between the multimodal inputs.
arXiv Detail & Related papers (2021-06-08T02:51:45Z) - Exploiting Invariance in Training Deep Neural Networks [4.169130102668252]
Inspired by two basic mechanisms in animal visual systems, we introduce a feature transform technique that imposes invariance properties in the training of deep neural networks.
The resulting algorithm requires less parameter tuning, trains well with an initial learning rate 1.0, and easily generalizes to different tasks.
Tested on ImageNet, MS COCO, and Cityscapes datasets, our proposed technique requires fewer iterations to train, surpasses all baselines by a large margin, seamlessly works on both small and large batch size training, and applies to different computer vision tasks of image classification, object detection, and semantic segmentation.
arXiv Detail & Related papers (2021-03-30T19:18:31Z) - On Robustness and Transferability of Convolutional Neural Networks [147.71743081671508]
Modern deep convolutional networks (CNNs) are often criticized for not generalizing under distributional shifts.
We study the interplay between out-of-distribution and transfer performance of modern image classification CNNs for the first time.
We find that increasing both the training set and model sizes significantly improve the distributional shift robustness.
arXiv Detail & Related papers (2020-07-16T18:39:04Z) - X-ModalNet: A Semi-Supervised Deep Cross-Modal Network for
Classification of Remote Sensing Data [69.37597254841052]
We propose a novel cross-modal deep-learning framework called X-ModalNet.
X-ModalNet generalizes well, owing to propagating labels on an updatable graph constructed by high-level features on the top of the network.
We evaluate X-ModalNet on two multi-modal remote sensing datasets (HSI-MSI and HSI-SAR) and achieve a significant improvement in comparison with several state-of-the-art methods.
arXiv Detail & Related papers (2020-06-24T15:29:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.