Improving Depression estimation from facial videos with face alignment,
training optimization and scheduling
- URL: http://arxiv.org/abs/2212.06400v1
- Date: Tue, 13 Dec 2022 06:46:38 GMT
- Title: Improving Depression estimation from facial videos with face alignment,
training optimization and scheduling
- Authors: Manuel Lage Ca\~nellas, Constantino \'Alvarez Casado, Le Nguyen,
Miguel Bordallo L\'opez
- Abstract summary: We propose two models based on ResNet-50 that use only static spatial information by using two specific face alignment methods.
Our experiments on benchmark datasets obtain similar results to sophisticated-temporal models for single streams or video, while the score-level fusion of two different streams outperforms state-of-the-art methods.
- Score: 0.3441021278275805
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning models have shown promising results in recognizing depressive
states using video-based facial expressions. While successful models typically
leverage using 3D-CNNs or video distillation techniques, the different use of
pretraining, data augmentation, preprocessing, and optimization techniques
across experiments makes it difficult to make fair architectural comparisons.
We propose instead to enhance two simple models based on ResNet-50 that use
only static spatial information by using two specific face alignment methods
and improved data augmentation, optimization, and scheduling techniques. Our
extensive experiments on benchmark datasets obtain similar results to
sophisticated spatio-temporal models for single streams, while the score-level
fusion of two different streams outperforms state-of-the-art methods. Our
findings suggest that specific modifications in the preprocessing and training
process result in noticeable differences in the performance of the models and
could hide the actual originally attributed to the use of different neural
network architectures.
Related papers
- VDPI: Video Deblurring with Pseudo-inverse Modeling [8.91065618315995]
Video deblurring is a challenging task that aims to recover sharp sequences from blur and noisy observations.
Image-formation model plays a crucial role in traditional model-based methods, constraining the possible solutions.
This paper proposes introducing knowledge of the image-formation model into a deep learning network by using the pseudo-inverse of the blur.
arXiv Detail & Related papers (2024-09-01T16:44:21Z) - Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation [59.184980778643464]
Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI)
In this paper, we introduce an innovative technique called self-play fine-tuning for diffusion models (SPIN-Diffusion)
Our approach offers an alternative to conventional supervised fine-tuning and RL strategies, significantly improving both model performance and alignment.
arXiv Detail & Related papers (2024-02-15T18:59:18Z) - AdaDiff: Adaptive Step Selection for Fast Diffusion [88.8198344514677]
We introduce AdaDiff, a framework designed to learn instance-specific step usage policies.
AdaDiff is optimized using a policy gradient method to maximize a carefully designed reward function.
Our approach achieves similar results in terms of visual quality compared to the baseline using a fixed 50 denoising steps.
arXiv Detail & Related papers (2023-11-24T11:20:38Z) - Diffusion Model for Dense Matching [34.13580888014]
The objective for establishing dense correspondence between paired images consists of two terms: a data term and a prior term.
We propose DiffMatch, a novel conditional diffusion-based framework designed to explicitly model both the data and prior terms.
Our experimental results demonstrate significant performance improvements of our method over existing approaches.
arXiv Detail & Related papers (2023-05-30T14:58:24Z) - Model-Based Deep Learning: On the Intersection of Deep Learning and
Optimization [101.32332941117271]
Decision making algorithms are used in a multitude of different applications.
Deep learning approaches that use highly parametric architectures tuned from data without relying on mathematical models are becoming increasingly popular.
Model-based optimization and data-centric deep learning are often considered to be distinct disciplines.
arXiv Detail & Related papers (2022-05-05T13:40:08Z) - MaxDropoutV2: An Improved Method to Drop out Neurons in Convolutional
Neural Networks [0.39146761527401425]
We present an improved version of a supervised regularization technique called MaxDropoutV2.
Results show that the model performs faster than the standard version and, in most cases, provides more accurate results.
arXiv Detail & Related papers (2022-03-05T13:41:56Z) - Revisiting Consistency Regularization for Semi-Supervised Learning [80.28461584135967]
We propose an improved consistency regularization framework by a simple yet effective technique, FeatDistLoss.
Experimental results show that our model defines a new state of the art for various datasets and settings.
arXiv Detail & Related papers (2021-12-10T20:46:13Z) - STAR: Sparse Transformer-based Action Recognition [61.490243467748314]
This work proposes a novel skeleton-based human action recognition model with sparse attention on the spatial dimension and segmented linear attention on the temporal dimension of data.
Experiments show that our model can achieve comparable performance while utilizing much less trainable parameters and achieve high speed in training and inference.
arXiv Detail & Related papers (2021-07-15T02:53:11Z) - Deep Optimized Priors for 3D Shape Modeling and Reconstruction [38.79018852887249]
We introduce a new learning framework for 3D modeling and reconstruction.
We show that the proposed strategy effectively breaks the barriers constrained by the pre-trained priors.
arXiv Detail & Related papers (2020-12-14T03:56:31Z) - Scalable Second Order Optimization for Deep Learning [34.12384996822749]
We present a scalable implementation of a second-order preconditioned method (concretely, a variant of full-matrix Adagrad)
Our novel design effectively utilizes the prevalent heterogeneous hardware architecture for training deep models, consisting of a multicore CPU coupled with multiple accelerator units.
We demonstrate superior performance compared to state-of-the-art on very large learning tasks such as machine translation with Transformers, language modeling with BERT, click-through rate prediction on Criteo, and image classification on ImageNet with ResNet-50.
arXiv Detail & Related papers (2020-02-20T20:51:33Z) - Learning End-to-End Lossy Image Compression: A Benchmark [90.35363142246806]
We first conduct a comprehensive literature survey of learned image compression methods.
We describe milestones in cutting-edge learned image-compression methods, review a broad range of existing works, and provide insights into their historical development routes.
By introducing a coarse-to-fine hyperprior model for entropy estimation and signal reconstruction, we achieve improved rate-distortion performance.
arXiv Detail & Related papers (2020-02-10T13:13:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.