EResFD: Rediscovery of the Effectiveness of Standard Convolution for
Lightweight Face Detection
- URL: http://arxiv.org/abs/2204.01209v3
- Date: Thu, 2 Nov 2023 10:17:58 GMT
- Title: EResFD: Rediscovery of the Effectiveness of Standard Convolution for
Lightweight Face Detection
- Authors: Joonhyun Jeong, Beomyoung Kim, Joonsang Yu, Youngjoon Yoo
- Abstract summary: We re-examine the effectiveness of the standard convolutional block as a lightweight backbone architecture for face detection.
We show that heavily channel-pruned standard convolution layers can achieve better accuracy and inference speed.
Our proposed detector EResFD obtained 80.4% mAP on WIDER FACE Hard subset which only takes 37.7 ms for VGA image inference on CPU.
- Score: 13.357235715178584
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper analyzes the design choices of face detection architecture that
improve efficiency of computation cost and accuracy. Specifically, we
re-examine the effectiveness of the standard convolutional block as a
lightweight backbone architecture for face detection. Unlike the current
tendency of lightweight architecture design, which heavily utilizes depthwise
separable convolution layers, we show that heavily channel-pruned standard
convolution layers can achieve better accuracy and inference speed when using a
similar parameter size. This observation is supported by the analyses
concerning the characteristics of the target data domain, faces. Based on our
observation, we propose to employ ResNet with a highly reduced channel, which
surprisingly allows high efficiency compared to other mobile-friendly networks
(e.g., MobileNetV1, V2, V3). From the extensive experiments, we show that the
proposed backbone can replace that of the state-of-the-art face detector with a
faster inference speed. Also, we further propose a new feature aggregation
method to maximize the detection performance. Our proposed detector EResFD
obtained 80.4% mAP on WIDER FACE Hard subset which only takes 37.7 ms for VGA
image inference on CPU. Code is available at https://github.com/clovaai/EResFD.
Related papers
- Global Context Aggregation Network for Lightweight Saliency Detection of
Surface Defects [70.48554424894728]
We develop a Global Context Aggregation Network (GCANet) for lightweight saliency detection of surface defects on the encoder-decoder structure.
First, we introduce a novel transformer encoder on the top layer of the lightweight backbone, which captures global context information through a novel Depth-wise Self-Attention (DSA) module.
The experimental results on three public defect datasets demonstrate that the proposed network achieves a better trade-off between accuracy and running efficiency compared with other 17 state-of-the-art methods.
arXiv Detail & Related papers (2023-09-22T06:19:11Z) - EfficientFace: An Efficient Deep Network with Feature Enhancement for
Accurate Face Detection [20.779512288834315]
Current lightweight CNN-based face detectors trading accuracy for efficiency have inadequate capability in handling insufficient feature representation.
We design an efficient deep face detector termed EfficientFace in this study, which contains three modules for feature enhancement.
We have evaluated EfficientFace on four public benchmarks and experimental results demonstrate the appealing performance of our method.
arXiv Detail & Related papers (2023-02-23T06:59:45Z) - The KFIoU Loss for Rotated Object Detection [115.334070064346]
In this paper, we argue that one effective alternative is to devise an approximate loss who can achieve trend-level alignment with SkewIoU loss.
Specifically, we model the objects as Gaussian distribution and adopt Kalman filter to inherently mimic the mechanism of SkewIoU.
The resulting new loss called KFIoU is easier to implement and works better compared with exact SkewIoU.
arXiv Detail & Related papers (2022-01-29T10:54:57Z) - Simple Training Strategies and Model Scaling for Object Detection [38.27709720726833]
We benchmark improvements on the vanilla ResNet-FPN backbone with RetinaNet and RCNN detectors.
The vanilla detectors are improved by 7.7% in accuracy while being 30% faster in speed.
Our largest Cascade RCNN-RS models achieve 52.9% AP with a ResNet152-FPN backbone and 53.6% with a SpineNet143L backbone.
arXiv Detail & Related papers (2021-06-30T18:41:47Z) - Sample and Computation Redistribution for Efficient Face Detection [137.19388513633484]
Training data sampling and computation distribution strategies are the keys to efficient and accurate face detection.
scrfdf34 outperforms the best competitor, TinaFace, by $3.86%$ (AP at hard set) while being more than emph3$times$ faster on GPUs with VGA-resolution images.
arXiv Detail & Related papers (2021-05-10T23:51:14Z) - An Efficient Multitask Neural Network for Face Alignment, Head Pose
Estimation and Face Tracking [9.39854778804018]
We propose an efficient multitask face alignment, face tracking and head pose estimation network (ATPN)
ATPN achieves improved performance compared to previous state-of-the-art methods while having less number of parameters and FLOPS.
arXiv Detail & Related papers (2021-03-13T04:41:15Z) - Anchor-free Small-scale Multispectral Pedestrian Detection [88.7497134369344]
We propose a method for effective and efficient multispectral fusion of the two modalities in an adapted single-stage anchor-free base architecture.
We aim at learning pedestrian representations based on object center and scale rather than direct bounding box predictions.
Results show our method's effectiveness in detecting small-scaled pedestrians.
arXiv Detail & Related papers (2020-08-19T13:13:01Z) - Learning Robust Feature Representations for Scene Text Detection [0.0]
We present a network architecture derived from the loss to maximize conditional log-likelihood.
By extending the layer of latent variables to multiple layers, the network is able to learn robust features on scale.
In experiments, the proposed algorithm significantly outperforms state-of-the-art methods in terms of both recall and precision.
arXiv Detail & Related papers (2020-05-26T01:06:47Z) - ASFD: Automatic and Scalable Face Detector [129.82350993748258]
We propose a novel Automatic and Scalable Face Detector (ASFD)
ASFD is based on a combination of neural architecture search techniques as well as a new loss design.
Our ASFD-D6 outperforms the prior strong competitors, and our lightweight ASFD-D0 runs at more than 120 FPS with Mobilenet for VGA-resolution images.
arXiv Detail & Related papers (2020-03-25T06:00:47Z) - Spatial-Spectral Residual Network for Hyperspectral Image
Super-Resolution [82.1739023587565]
We propose a novel spectral-spatial residual network for hyperspectral image super-resolution (SSRNet)
Our method can effectively explore spatial-spectral information by using 3D convolution instead of 2D convolution, which enables the network to better extract potential information.
In each unit, we employ spatial and temporal separable 3D convolution to extract spatial and spectral information, which not only reduces unaffordable memory usage and high computational cost, but also makes the network easier to train.
arXiv Detail & Related papers (2020-01-14T03:34:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.