A New Perspective on Smiling and Laughter Detection: Intensity Levels
Matter
- URL: http://arxiv.org/abs/2403.02112v1
- Date: Mon, 4 Mar 2024 15:15:57 GMT
- Title: A New Perspective on Smiling and Laughter Detection: Intensity Levels
Matter
- Authors: Hugo Bohy, Kevin El Haddad and Thierry Dutoit
- Abstract summary: We present a deep learning-based multimodal smile and laugh classification system.
We compare the use of audio and vision-based models as well as a fusion approach.
We show that, as expected, the fusion leads to a better generalization on unseen data.
- Score: 4.493507573183109
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Smiles and laughs detection systems have attracted a lot of attention in the
past decade contributing to the improvement of human-agent interaction systems.
But very few considered these expressions as distinct, although no prior work
clearly proves them to belong to the same category or not. In this work, we
present a deep learning-based multimodal smile and laugh classification system,
considering them as two different entities. We compare the use of audio and
vision-based models as well as a fusion approach. We show that, as expected,
the fusion leads to a better generalization on unseen data. We also present an
in-depth analysis of the behavior of these models on the smiles and laughs
intensity levels. The analyses on the intensity levels show that the
relationship between smiles and laughs might not be as simple as a binary one
or even grouping them in a single category, and so, a more complex approach
should be taken when dealing with them. We also tackle the problem of limited
resources by showing that transfer learning allows the models to improve the
detection of confusing intensity levels.
Related papers
- RelVAE: Generative Pretraining for few-shot Visual Relationship
Detection [2.2230760534775915]
We present the first pretraining method for few-shot predicate classification that does not require any annotated relations.
We construct few-shot training splits and show quantitative experiments on VG200 and VRD datasets.
arXiv Detail & Related papers (2023-11-27T19:08:08Z) - Fairness meets Cross-Domain Learning: a new perspective on Models and
Metrics [80.07271410743806]
We study the relationship between cross-domain learning (CD) and model fairness.
We introduce a benchmark on face and medical images spanning several demographic groups as well as classification and localization tasks.
Our study covers 14 CD approaches alongside three state-of-the-art fairness algorithms and shows how the former can outperform the latter.
arXiv Detail & Related papers (2023-03-25T09:34:05Z) - Impact of annotation modality on label quality and model performance in
the automatic assessment of laughter in-the-wild [8.242747994568212]
It is unclear how perception and annotation of laughter differ when annotated from other modalities like video, via the body movements of laughter.
We ask whether annotations of laughter are congruent across modalities, and compare the effect that labeling modality has on machine learning model performance.
Our analysis of more than 4000 annotations acquired from 48 annotators revealed evidence for incongruity in the perception of laughter, and its intensity between modalities.
arXiv Detail & Related papers (2022-11-02T00:18:08Z) - MMLatch: Bottom-up Top-down Fusion for Multimodal Sentiment Analysis [84.7287684402508]
Current deep learning approaches for multimodal fusion rely on bottom-up fusion of high and mid-level latent modality representations.
Models of human perception highlight the importance of top-down fusion, where high-level representations affect the way sensory inputs are perceived.
We propose a neural architecture that captures top-down cross-modal interactions, using a feedback mechanism in the forward pass during network training.
arXiv Detail & Related papers (2022-01-24T17:48:04Z) - Do Self-Supervised and Supervised Methods Learn Similar Visual
Representations? [3.1594831736896025]
We compare a constrastive self-supervised algorithm (SimCLR) to supervision for simple image data in a common architecture.
We find that the methods learn similar intermediate representations through dissimilar means, and that the representations diverge rapidly in the final few layers.
Our work particularly highlights the importance of the learned intermediate representations, and raises important questions for auxiliary task design.
arXiv Detail & Related papers (2021-10-01T16:51:29Z) - Improved Xception with Dual Attention Mechanism and Feature Fusion for
Face Forgery Detection [6.718457497370086]
Face forgery detection has become a research hotspot in recent years.
We propose an improved Xception with dual attention mechanism and feature fusion for face forgery detection.
Experimental results evaluated on three Deepfake datasets demonstrate that the proposed method outperforms Xception.
arXiv Detail & Related papers (2021-09-29T01:54:13Z) - ReSSL: Relational Self-Supervised Learning with Weak Augmentation [68.47096022526927]
Self-supervised learning has achieved great success in learning visual representations without data annotations.
We introduce a novel relational SSL paradigm that learns representations by modeling the relationship between different instances.
Our proposed ReSSL significantly outperforms the previous state-of-the-art algorithms in terms of both performance and training efficiency.
arXiv Detail & Related papers (2021-07-20T06:53:07Z) - Dynamic Semantic Matching and Aggregation Network for Few-shot Intent
Detection [69.2370349274216]
Few-shot Intent Detection is challenging due to the scarcity of available annotated utterances.
Semantic components are distilled from utterances via multi-head self-attention.
Our method provides a comprehensive matching measure to enhance representations of both labeled and unlabeled instances.
arXiv Detail & Related papers (2020-10-06T05:16:38Z) - Hard Negative Mixing for Contrastive Learning [29.91220669060252]
We argue that an important aspect of contrastive learning, i.e., the effect of hard negatives, has so far been neglected.
We propose hard negative mixing strategies at the feature level, that can be computed on-the-fly with a minimal computational overhead.
arXiv Detail & Related papers (2020-10-02T14:34:58Z) - Symbiotic Adversarial Learning for Attribute-based Person Search [86.7506832053208]
We present a symbiotic adversarial learning framework, called SAL.Two GANs sit at the base of the framework in a symbiotic learning scheme.
Specifically, two different types of generative adversarial networks learn collaboratively throughout the training process.
arXiv Detail & Related papers (2020-07-19T07:24:45Z) - Dynamic Dual-Attentive Aggregation Learning for Visible-Infrared Person
Re-Identification [208.1227090864602]
Visible-infrared person re-identification (VI-ReID) is a challenging cross-modality pedestrian retrieval problem.
Existing VI-ReID methods tend to learn global representations, which have limited discriminability and weak robustness to noisy images.
We propose a novel dynamic dual-attentive aggregation (DDAG) learning method by mining both intra-modality part-level and cross-modality graph-level contextual cues for VI-ReID.
arXiv Detail & Related papers (2020-07-18T03:08:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.