Computational efficient deep neural network with difference attention
maps for facial action unit detection
- URL: http://arxiv.org/abs/2011.12082v2
- Date: Fri, 27 Nov 2020 03:23:50 GMT
- Title: Computational efficient deep neural network with difference attention
maps for facial action unit detection
- Authors: Jing Chen, Chenhui Wang, Kejun Wang, Meichen Liu
- Abstract summary: We propose a computational efficient end-to-end training deep neural network (CEDNN) model and spatial attention maps based on difference images.
A large number of experimental results show that the proposed CEDNN is obviously better than the traditional deep learning method on DISFA+ and CK+ datasets.
- Score: 3.73202122588308
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we propose a computational efficient end-to-end training deep
neural network (CEDNN) model and spatial attention maps based on difference
images. Firstly, the difference image is generated by image processing. Then
five binary images of difference images are obtained using different
thresholds, which are used as spatial attention maps. We use group convolution
to reduce model complexity. Skip connection and $\text{1}\times \text{1}$
convolution are used to ensure good performance even if the network model is
not deep. As an input, spatial attention map can be selectively fed into the
input of each block. The feature maps tend to focus on the parts that are
related to the target task better. In addition, we only need to adjust the
parameters of classifier to train different numbers of AU. It can be easily
extended to varying datasets without increasing too much computation. A large
number of experimental results show that the proposed CEDNN is obviously better
than the traditional deep learning method on DISFA+ and CK+ datasets. After
adding spatial attention maps, the result is better than the most advanced AU
detection method. At the same time, the scale of the network is small, the
running speed is fast, and the requirement for experimental equipment is low.
Related papers
- Deep Homography Estimation for Visual Place Recognition [49.235432979736395]
We propose a transformer-based deep homography estimation (DHE) network.
It takes the dense feature map extracted by a backbone network as input and fits homography for fast and learnable geometric verification.
Experiments on benchmark datasets show that our method can outperform several state-of-the-art methods.
arXiv Detail & Related papers (2024-02-25T13:22:17Z) - Training Convolutional Neural Networks with the Forward-Forward
algorithm [1.74440662023704]
Forward Forward (FF) algorithm has up to now only been used in fully connected networks.
We show how the FF paradigm can be extended to CNNs.
Our FF-trained CNN, featuring a novel spatially-extended labeling technique, achieves a classification accuracy of 99.16% on the MNIST hand-written digits dataset.
arXiv Detail & Related papers (2023-12-22T18:56:35Z) - Unsupervised convolutional neural network fusion approach for change
detection in remote sensing images [1.892026266421264]
We introduce a completely unsupervised shallow convolutional neural network (USCNN) fusion approach for change detection.
Our model has three features: the entire training process is conducted in an unsupervised manner, the network architecture is shallow, and the objective function is sparse.
Experimental results on four real remote sensing datasets indicate the feasibility and effectiveness of the proposed approach.
arXiv Detail & Related papers (2023-11-07T03:10:17Z) - Deep Multi-Threshold Spiking-UNet for Image Processing [51.88730892920031]
This paper introduces the novel concept of Spiking-UNet for image processing, which combines the power of Spiking Neural Networks (SNNs) with the U-Net architecture.
To achieve an efficient Spiking-UNet, we face two primary challenges: ensuring high-fidelity information propagation through the network via spikes and formulating an effective training strategy.
Experimental results show that, on image segmentation and denoising, our Spiking-UNet achieves comparable performance to its non-spiking counterpart.
arXiv Detail & Related papers (2023-07-20T16:00:19Z) - Scale Attention for Learning Deep Face Representation: A Study Against
Visual Scale Variation [69.45176408639483]
We reform the conv layer by resorting to the scale-space theory.
We build a novel style named SCale AttentioN Conv Neural Network (textbfSCAN-CNN)
As a single-shot scheme, the inference is more efficient than multi-shot fusion.
arXiv Detail & Related papers (2022-09-19T06:35:04Z) - Two-Stream Graph Convolutional Network for Intra-oral Scanner Image
Segmentation [133.02190910009384]
We propose a two-stream graph convolutional network (i.e., TSGCN) to handle inter-view confusion between different raw attributes.
Our TSGCN significantly outperforms state-of-the-art methods in 3D tooth (surface) segmentation.
arXiv Detail & Related papers (2022-04-19T10:41:09Z) - Anchor-free Small-scale Multispectral Pedestrian Detection [88.7497134369344]
We propose a method for effective and efficient multispectral fusion of the two modalities in an adapted single-stage anchor-free base architecture.
We aim at learning pedestrian representations based on object center and scale rather than direct bounding box predictions.
Results show our method's effectiveness in detecting small-scaled pedestrians.
arXiv Detail & Related papers (2020-08-19T13:13:01Z) - Towards Dense People Detection with Deep Learning and Depth images [9.376814409561726]
This paper proposes a DNN-based system that detects multiple people from a single depth image.
Our neural network processes a depth image and outputs a likelihood map in image coordinates.
We show this strategy to be effective, producing networks that generalize to work with scenes different from those used during training.
arXiv Detail & Related papers (2020-07-14T16:43:02Z) - Real-Time High-Performance Semantic Image Segmentation of Urban Street
Scenes [98.65457534223539]
We propose a real-time high-performance DCNN-based method for robust semantic segmentation of urban street scenes.
The proposed method achieves the accuracy of 73.6% and 68.0% mean Intersection over Union (mIoU) with the inference speed of 51.0 fps and 39.3 fps.
arXiv Detail & Related papers (2020-03-11T08:45:53Z) - On the Texture Bias for Few-Shot CNN Segmentation [21.349705243254423]
Convolutional Neural Networks (CNNs) are driven by shapes to perform visual recognition tasks.
Recent evidence suggests texture bias in CNNs provides higher performing models when learning on large labeled training datasets.
We propose a novel architecture that integrates a set of Difference of Gaussians (DoG) to attenuate high-frequency local components in the feature space.
arXiv Detail & Related papers (2020-03-09T11:55:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.