Related papers: MuSACo: Multimodal Subject-Specific Selection and Adaptation for Expression Recognition with Co-Training

MuSACo: Multimodal Subject-Specific Selection and Adaptation for Expression Recognition with Co-Training

URL: http://arxiv.org/abs/2508.12522v1
Date: Sun, 17 Aug 2025 23:08:21 GMT
Title: MuSACo: Multimodal Subject-Specific Selection and Adaptation for Expression Recognition with Co-Training
Authors: Muhammad Osama Zeeshan, Natacha Gillet, Alessandro Lameiras Koerich, Marco Pedersoli, Francois Bremond, Eric Granger,
Abstract summary: We introduce MuSACo, a multi-modal subject-specific selection and adaptation method for personalized expression recognition.<n>This makes MuSACo relevant for affective computing applications in digital health, such as patient-specific assessment for stress or pain.<n>Our experimental results on challenging multimodal ER datasets: BioVid and StressID, show that MuSACo can outperform UDA (blending) and state-of-the-art MSDA methods.
Score: 52.99217736494484
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Personalized expression recognition (ER) involves adapting a machine learning model to subject-specific data for improved recognition of expressions with considerable interpersonal variability. Subject-specific ER can benefit significantly from multi-source domain adaptation (MSDA) methods, where each domain corresponds to a specific subject, to improve model accuracy and robustness. Despite promising results, state-of-the-art MSDA approaches often overlook multimodal information or blend sources into a single domain, limiting subject diversity and failing to explicitly capture unique subject-specific characteristics. To address these limitations, we introduce MuSACo, a multi-modal subject-specific selection and adaptation method for ER based on co-training. It leverages complementary information across multiple modalities and multiple source domains for subject-specific adaptation. This makes MuSACo particularly relevant for affective computing applications in digital health, such as patient-specific assessment for stress or pain, where subject-level nuances are crucial. MuSACo selects source subjects relevant to the target and generates pseudo-labels using the dominant modality for class-aware learning, in conjunction with a class-agnostic loss to learn from less confident target samples. Finally, source features from each modality are aligned, while only confident target features are combined. Our experimental results on challenging multimodal ER datasets: BioVid and StressID, show that MuSACo can outperform UDA (blending) and state-of-the-art MSDA methods.

Related papers

Progressive Multi-Source Domain Adaptation for Personalized Facial Expression Recognition [51.61979855488214]
Personalized facial expression recognition (FER) involves adapting a machine learning model using samples from labeled sources and unlabeled target domains.<n>We propose a progressive MSDA approach that gradually introduces information from subjects based on their similarity to the target subject.<n>Our experiments show the effectiveness of our proposed method on pain datasets: Biovid and UNBC-McMaster.
arXiv Detail & Related papers (2025-04-05T19:14:51Z)
AMM-Diff: Adaptive Multi-Modality Diffusion Network for Missing Modality Imputation [2.8498944632323755]
In clinical practice, full imaging is not always feasible, often due to complex acquisition protocols, stringent privacy regulations, or specific clinical needs.<n>A promising solution is missing data imputation, where absent modalities are generated from available ones.<n>We propose an Adaptive Multi-Modality Diffusion Network (AMM-Diff), a novel diffusion-based generative model capable of handling any number of input modalities and generating the missing ones.
arXiv Detail & Related papers (2025-01-22T12:29:33Z)
MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild [81.32127423981426]
Multimodal emotion recognition based on audio and video data is important for real-world applications. Recent methods have focused on exploiting advances of self-supervised learning (SSL) for pre-training of strong multimodal encoders. We propose a different perspective on the problem and investigate the advancement of multimodal DFER performance by adapting SSL-pre-trained disjoint unimodal encoders.
arXiv Detail & Related papers (2024-04-13T13:39:26Z)
Subject-Based Domain Adaptation for Facial Expression Recognition [51.10374151948157]
Adapting a deep learning model to a specific target individual is a challenging facial expression recognition task. This paper introduces a new MSDA method for subject-based domain adaptation in FER. It efficiently leverages information from multiple source subjects to adapt a deep FER model to a single target individual.
arXiv Detail & Related papers (2023-12-09T18:40:37Z)
Exploiting Modality-Specific Features For Multi-Modal Manipulation Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks. Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment. We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z)
DCID: Deep Canonical Information Decomposition [84.59396326810085]
We consider the problem of identifying the signal shared between two one-dimensional target variables. We propose ICM, an evaluation metric which can be used in the presence of ground-truth labels. We also propose Deep Canonical Information Decomposition (DCID) - a simple, yet effective approach for learning the shared variables.
arXiv Detail & Related papers (2023-06-27T16:59:06Z)
MS-MDA: Multisource Marginal Distribution Adaptation for Cross-subject and Cross-session EEG Emotion Recognition [14.065932956210336]
We propose a multi-source marginal distribution adaptation (MS-MDA) for EEG emotion recognition. First, we assume that different EEG data share the same low-level features, then we construct independent branches to adopt one-to-one domain adaptation and extract domain-specific features. Experimental results show that the MS-MDA outperforms the comparison methods and state-of-the-art models in cross-session and cross-subject transfer scenarios.
arXiv Detail & Related papers (2021-07-16T07:19:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.