Scale Attention for Learning Deep Face Representation: A Study Against
Visual Scale Variation
- URL: http://arxiv.org/abs/2209.08788v1
- Date: Mon, 19 Sep 2022 06:35:04 GMT
- Title: Scale Attention for Learning Deep Face Representation: A Study Against
Visual Scale Variation
- Authors: Hailin Shi, Hang Du, Yibo Hu, Jun Wang, Dan Zeng, Ting Yao
- Abstract summary: We reform the conv layer by resorting to the scale-space theory.
We build a novel style named SCale AttentioN Conv Neural Network (textbfSCAN-CNN)
As a single-shot scheme, the inference is more efficient than multi-shot fusion.
- Score: 69.45176408639483
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human face images usually appear with wide range of visual scales. The
existing face representations pursue the bandwidth of handling scale variation
via multi-scale scheme that assembles a finite series of predefined scales.
Such multi-shot scheme brings inference burden, and the predefined scales
inevitably have gap from real data. Instead, learning scale parameters from
data, and using them for one-shot feature inference, is a decent solution. To
this end, we reform the conv layer by resorting to the scale-space theory, and
achieve two-fold facilities: 1) the conv layer learns a set of scales from real
data distribution, each of which is fulfilled by a conv kernel; 2) the layer
automatically highlights the feature at the proper channel and location
corresponding to the input pattern scale and its presence. Then, we accomplish
the hierarchical scale attention by stacking the reformed layers, building a
novel style named SCale AttentioN Conv Neural Network (\textbf{SCAN-CNN}). We
apply SCAN-CNN to the face recognition task and push the frontier of SOTA
performance. The accuracy gain is more evident when the face images are blurry.
Meanwhile, as a single-shot scheme, the inference is more efficient than
multi-shot fusion. A set of tools are made to ensure the fast training of
SCAN-CNN and zero increase of inference cost compared with the plain CNN.
Related papers
- Scale Propagation Network for Generalizable Depth Completion [16.733495588009184]
We propose a novel scale propagation normalization (SP-Norm) method to propagate scales from input to output.
We also develop a new network architecture based on SP-Norm and the ConvNeXt V2 backbone.
Our model consistently achieves the best accuracy with faster speed and lower memory when compared to state-of-the-art methods.
arXiv Detail & Related papers (2024-10-24T03:53:06Z) - Intra-task Mutual Attention based Vision Transformer for Few-Shot Learning [12.5354658533836]
Humans possess remarkable ability to accurately classify new, unseen images after being exposed to only a few examples.
For artificial neural network models, determining the most relevant features for distinguishing between two images with limited samples presents a challenge.
We propose an intra-task mutual attention method for few-shot learning, that involves splitting the support and query samples into patches.
arXiv Detail & Related papers (2024-05-06T02:02:57Z) - Multi-scale Unified Network for Image Classification [33.560003528712414]
CNNs face notable challenges in performance and computational efficiency when dealing with real-world, multi-scale image inputs.
We propose Multi-scale Unified Network (MUSN) consisting of multi-scales, a unified network, and scale-invariant constraint.
MUSN yields an accuracy increase up to 44.53% and diminishes FLOPs by 7.01-16.13% in multi-scale scenarios.
arXiv Detail & Related papers (2024-03-27T06:40:26Z) - Scale-Equivariant UNet for Histopathology Image Segmentation [1.213915839836187]
Convolutional Neural Networks (CNNs) trained on such images at a given scale fail to generalise to those at different scales.
We propose the Scale-Equivariant UNet (SEUNet) for image segmentation by building on scale-space theory.
arXiv Detail & Related papers (2023-04-10T14:03:08Z) - Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial
Representation Learning [55.762840052788945]
We present Scale-MAE, a pretraining method that explicitly learns relationships between data at different, known scales.
We find that tasking the network with reconstructing both low/high frequency images leads to robust multiscale representations for remote sensing imagery.
arXiv Detail & Related papers (2022-12-30T03:15:34Z) - Decoupled Multi-task Learning with Cyclical Self-Regulation for Face
Parsing [71.19528222206088]
We propose a novel Decoupled Multi-task Learning with Cyclical Self-Regulation for face parsing.
Specifically, DML-CSR designs a multi-task model which comprises face parsing, binary edge, and category edge detection.
Our method achieves the new state-of-the-art performance on the Helen, CelebA-HQ, and LapaMask datasets.
arXiv Detail & Related papers (2022-03-28T02:12:30Z) - Multi-Agent Semi-Siamese Training for Long-tail and Shallow Face
Learning [54.13876727413492]
In many real-world scenarios of face recognition, the depth of training dataset is shallow, which means only two face images are available for each ID.
With the non-uniform increase of samples, such issue is converted to a more general case, a.k.a a long-tail face learning.
Based on the Semi-Siamese Training (SST), we introduce an advanced solution, named Multi-Agent Semi-Siamese Training (MASST)
MASST includes a probe network and multiple gallery agents, the former aims to encode the probe features, and the latter constitutes a stack of
arXiv Detail & Related papers (2021-05-10T04:57:32Z) - Exploiting Invariance in Training Deep Neural Networks [4.169130102668252]
Inspired by two basic mechanisms in animal visual systems, we introduce a feature transform technique that imposes invariance properties in the training of deep neural networks.
The resulting algorithm requires less parameter tuning, trains well with an initial learning rate 1.0, and easily generalizes to different tasks.
Tested on ImageNet, MS COCO, and Cityscapes datasets, our proposed technique requires fewer iterations to train, surpasses all baselines by a large margin, seamlessly works on both small and large batch size training, and applies to different computer vision tasks of image classification, object detection, and semantic segmentation.
arXiv Detail & Related papers (2021-03-30T19:18:31Z) - Scalable Visual Transformers with Hierarchical Pooling [61.05787583247392]
We propose a Hierarchical Visual Transformer (HVT) which progressively pools visual tokens to shrink the sequence length.
It brings a great benefit by scaling dimensions of depth/width/resolution/patch size without introducing extra computational complexity.
Our HVT outperforms the competitive baselines on ImageNet and CIFAR-100 datasets.
arXiv Detail & Related papers (2021-03-19T03:55:58Z) - Learning to Learn Parameterized Classification Networks for Scalable
Input Images [76.44375136492827]
Convolutional Neural Networks (CNNs) do not have a predictable recognition behavior with respect to the input resolution change.
We employ meta learners to generate convolutional weights of main networks for various input scales.
We further utilize knowledge distillation on the fly over model predictions based on different input resolutions.
arXiv Detail & Related papers (2020-07-13T04:27:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.