ASFD: Automatic and Scalable Face Detector
- URL: http://arxiv.org/abs/2003.11228v3
- Date: Tue, 31 Mar 2020 16:09:40 GMT
- Title: ASFD: Automatic and Scalable Face Detector
- Authors: Bin Zhang, Jian Li, Yabiao Wang, Ying Tai, Chengjie Wang, Jilin Li,
Feiyue Huang, Yili Xia, Wenjiang Pei, Rongrong Ji
- Abstract summary: We propose a novel Automatic and Scalable Face Detector (ASFD)
ASFD is based on a combination of neural architecture search techniques as well as a new loss design.
Our ASFD-D6 outperforms the prior strong competitors, and our lightweight ASFD-D0 runs at more than 120 FPS with Mobilenet for VGA-resolution images.
- Score: 129.82350993748258
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a novel Automatic and Scalable Face Detector
(ASFD), which is based on a combination of neural architecture search
techniques as well as a new loss design. First, we propose an automatic feature
enhance module named Auto-FEM by improved differential architecture search,
which allows efficient multi-scale feature fusion and context enhancement.
Second, we use Distance-based Regression and Margin-based Classification (DRMC)
multi-task loss to predict accurate bounding boxes and learn highly
discriminative deep features. Third, we adopt compound scaling methods and
uniformly scale the backbone, feature modules, and head networks to develop a
family of ASFD, which are consistently more efficient than the state-of-the-art
face detectors. Extensive experiments conducted on popular benchmarks, e.g.
WIDER FACE and FDDB, demonstrate that our ASFD-D6 outperforms the prior strong
competitors, and our lightweight ASFD-D0 runs at more than 120 FPS with
Mobilenet for VGA-resolution images.
Related papers
- Embracing Events and Frames with Hierarchical Feature Refinement Network for Object Detection [17.406051477690134]
Event cameras output sparse and asynchronous events, providing a potential solution to solve these problems.
We propose a novel hierarchical feature refinement network for event-frame fusion.
Our method exhibits significantly better robustness when introducing 15 different corruption types to the frame images.
arXiv Detail & Related papers (2024-07-17T14:09:46Z) - Straight Through Gumbel Softmax Estimator based Bimodal Neural Architecture Search for Audio-Visual Deepfake Detection [6.367999777464464]
multimodal deepfake detectors rely on conventional fusion methods, such as majority rule and ensemble voting.
In this paper, we introduce the Straight-through Gumbel-Softmax framework, offering a comprehensive approach to search multimodal fusion model architectures.
Experiments on the FakeAVCeleb and SWAN-DF datasets demonstrated an impressive AUC value 94.4% achieved with minimal model parameters.
arXiv Detail & Related papers (2024-06-19T09:26:22Z) - Neural Networks with A La Carte Selection of Activation Functions [0.0]
Activation functions (AFs) are pivotal to the success (or failure) of a neural network.
We combine a slew of known AFs into successful architectures, proposing three methods to do so beneficially.
We show that all methods often produce significantly better results for 25 classification problems when compared with a standard network composed of ReLU hidden units and a softmax output unit.
arXiv Detail & Related papers (2022-06-24T09:09:39Z) - ASFD: Automatic and Scalable Face Detector [59.31799101216593]
We propose to search an effective FAE architecture, termed AutoFAE, which outperforms all existing FAE modules in face detection with a considerable margin.
In particular, our strong ASFD-D6 outperforms the best competitor with AP 96.7/96.2/92.1 on WIDER Face test, and the lightweight ASFD-D0 costs about 3.1 ms, more than 320 FPS.
arXiv Detail & Related papers (2022-01-26T07:11:51Z) - Efficient Person Search: An Anchor-Free Approach [86.45858994806471]
Person search aims to simultaneously localize and identify a query person from realistic, uncropped images.
To achieve this goal, state-of-the-art models typically add a re-id branch upon two-stage detectors like Faster R-CNN.
In this work, we present an anchor-free approach to efficiently tackling this challenging task, by introducing the following dedicated designs.
arXiv Detail & Related papers (2021-09-01T07:01:33Z) - EPMF: Efficient Perception-aware Multi-sensor Fusion for 3D Semantic Segmentation [62.210091681352914]
We study multi-sensor fusion for 3D semantic segmentation for many applications, such as autonomous driving and robotics.
In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF)
We propose a two-stream network to extract features from the two modalities separately. The extracted features are fused by effective residual-based fusion modules.
arXiv Detail & Related papers (2021-06-21T10:47:26Z) - Efficient Two-Stream Network for Violence Detection Using Separable
Convolutional LSTM [0.0]
We propose an efficient two-stream deep learning architecture leveraging Separable Convolutional LSTM (SepConvLSTM) and pre-trained MobileNet.
SepConvLSTM is constructed by replacing convolution operation at each gate of ConvLSTM with a depthwise separable convolution.
Our model outperforms the accuracy on the larger and more challenging RWF-2000 dataset by more than a 2% margin.
arXiv Detail & Related papers (2021-02-21T12:01:48Z) - Anchor-free Small-scale Multispectral Pedestrian Detection [88.7497134369344]
We propose a method for effective and efficient multispectral fusion of the two modalities in an adapted single-stage anchor-free base architecture.
We aim at learning pedestrian representations based on object center and scale rather than direct bounding box predictions.
Results show our method's effectiveness in detecting small-scaled pedestrians.
arXiv Detail & Related papers (2020-08-19T13:13:01Z) - Hierarchical Dynamic Filtering Network for RGB-D Salient Object
Detection [91.43066633305662]
The main purpose of RGB-D salient object detection (SOD) is how to better integrate and utilize cross-modal fusion information.
In this paper, we explore these issues from a new perspective.
We implement a kind of more flexible and efficient multi-scale cross-modal feature processing.
arXiv Detail & Related papers (2020-07-13T07:59:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.