Related papers: Revisiting Facial Key Point Detection: An Efficient Approach Using Deep Neural Networks

Revisiting Facial Key Point Detection: An Efficient Approach Using Deep Neural Networks

URL: http://arxiv.org/abs/2205.07121v1
Date: Sat, 14 May 2022 19:49:03 GMT
Title: Revisiting Facial Key Point Detection: An Efficient Approach Using Deep Neural Networks
Authors: Prathima Dileep, Bharath Kumar Bolla, Sabeesh Ethiraj
Abstract summary: We develop efficient deep learning models in terms of model size, parameters, and inference time. MobileNetV2 architecture produced the lowest RMSE and inference time. manually optimized CNN architectures performed similarly to Auto Keras tuned architecture.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Facial landmark detection is a widely researched field of deep learning as this has a wide range of applications in many fields. These key points are distinguishing characteristic points on the face, such as the eyes center, the eye's inner and outer corners, the mouth center, and the nose tip from which human emotions and intent can be explained. The focus of our work has been evaluating transfer learning models such as MobileNetV2 and NasNetMobile, including custom CNN architectures. The objective of the research has been to develop efficient deep learning models in terms of model size, parameters, and inference time and to study the effect of augmentation imputation and fine-tuning on these models. It was found that while augmentation techniques produced lower RMSE scores than imputation techniques, they did not affect the inference time. MobileNetV2 architecture produced the lowest RMSE and inference time. Moreover, our results indicate that manually optimized CNN architectures performed similarly to Auto Keras tuned architecture. However, manually optimized architectures yielded better inference time and training curves.

Related papers

NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals [58.83169560132308]
We introduce NNsight and NDIF, technologies that work in tandem to enable scientific study of the representations and computations learned by very large neural networks.
arXiv Detail & Related papers (2024-07-18T17:59:01Z)
Deep Learning Approaches for Human Action Recognition in Video Data [0.8080830346931087]
This study conducts an in-depth analysis of various deep learning models to address this challenge. We focus on Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Two-Stream ConvNets. The results of this study underscore the potential of composite models in achieving robust human action recognition.
arXiv Detail & Related papers (2024-03-11T15:31:25Z)
Efficient and Accurate Hyperspectral Image Demosaicing with Neural Network Architectures [3.386560551295746]
This study investigates the effectiveness of neural network architectures in hyperspectral image demosaicing. We introduce a range of network models and modifications, and compare them with classical methods and existing reference network approaches. Results indicate that our networks outperform or match reference models in both datasets demonstrating exceptional performance.
arXiv Detail & Related papers (2023-12-21T08:02:49Z)
Computation-efficient Deep Learning for Computer Vision: A Survey [121.84121397440337]
Deep learning models have reached or even exceeded human-level performance in a range of visual perception tasks. Deep learning models usually demand significant computational resources, leading to impractical power consumption, latency, or carbon emissions in real-world scenarios. New research focus is computationally efficient deep learning, which strives to achieve satisfactory performance while minimizing the computational cost during inference.
arXiv Detail & Related papers (2023-08-27T03:55:28Z)
Comparison Analysis of Traditional Machine Learning and Deep Learning Techniques for Data and Image Classification [62.997667081978825]
The purpose of the study is to analyse and compare the most common machine learning and deep learning techniques used for computer vision 2D object classification tasks. Firstly, we will present the theoretical background of the Bag of Visual words model and Deep Convolutional Neural Networks (DCNN) Secondly, we will implement a Bag of Visual Words model, the VGG16 CNN Architecture.
arXiv Detail & Related papers (2022-04-11T11:34:43Z)
Neural Architecture Search for Speech Emotion Recognition [72.1966266171951]
We propose to apply neural architecture search (NAS) techniques to automatically configure the SER models. We show that NAS can improve SER performance (54.89% to 56.28%) while maintaining model parameter sizes.
arXiv Detail & Related papers (2022-03-31T10:16:10Z)
Improving Sample Efficiency of Value Based Models Using Attention and Vision Transformers [52.30336730712544]
We introduce a deep reinforcement learning architecture whose purpose is to increase sample efficiency without sacrificing performance. We propose a visually attentive model that uses transformers to learn a self-attention mechanism on the feature maps of the state representation. We demonstrate empirically that this architecture improves sample complexity for several Atari environments, while also achieving better performance in some of the games.
arXiv Detail & Related papers (2022-02-01T19:03:03Z)
Multi-Branch Deep Radial Basis Function Networks for Facial Emotion Recognition [80.35852245488043]
We propose a CNN based architecture enhanced with multiple branches formed by radial basis function (RBF) units. RBF units capture local patterns shared by similar instances using an intermediate representation. We show it is the incorporation of local information what makes the proposed model competitive.
arXiv Detail & Related papers (2021-09-07T21:05:56Z)
Scene Understanding for Autonomous Driving [0.0]
We study the behaviour of different configurations of RetinaNet, Faster R-CNN and Mask R-CNN presented in Detectron2. We observe a significant improvement in performance after fine-tuning these models on the datasets of interest. We run inference in unusual situations using out of context datasets, and present interesting results.
arXiv Detail & Related papers (2021-05-11T09:50:05Z)
A Compact Deep Architecture for Real-time Saliency Prediction [42.58396452892243]
Saliency models aim to imitate the attention mechanism in the human visual system. Deep models have a high number of parameters which makes them less suitable for real-time applications. Here we propose a compact yet fast model for real-time saliency prediction.
arXiv Detail & Related papers (2020-08-30T17:47:16Z)
A Semi-Supervised Assessor of Neural Architectures [157.76189339451565]
We employ an auto-encoder to discover meaningful representations of neural architectures. A graph convolutional neural network is introduced to predict the performance of architectures.
arXiv Detail & Related papers (2020-05-14T09:02:33Z)
Inferring Convolutional Neural Networks' accuracies from their architectural characterizations [0.0]
We study the relationships between a CNN's architecture and its performance. We show that the attributes can be predictive of the networks' performance in two specific computer vision-based physics problems. We use machine learning models to predict whether a network can perform better than a certain threshold accuracy before training.
arXiv Detail & Related papers (2020-01-07T16:41:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.