Related papers: Mask-Guided Multi-Task Network for Face Attribute Recognition

Mask-Guided Multi-Task Network for Face Attribute Recognition

URL: http://arxiv.org/abs/2601.01408v1
Date: Sun, 04 Jan 2026 07:21:15 GMT
Title: Mask-Guided Multi-Task Network for Face Attribute Recognition
Authors: Gong Gao, Zekai Wang, Jian Zhao, Ziqi Xie, Xianhui Liu, Weidong Zhao,
Abstract summary: Mask-Guided Multi-Task Network (MGMTN) integrates Adaptive Mask Learning (AML) and Group-Global Feature Fusion (G2FF)<n>AML accurately localizes critical facial parts and generates group masks that delineate meaningful feature regions.<n>G2FF combines group and global features to enhance FAR learning, enabling more precise attribute identification.
Score: 16.994619834678325
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Face Attribute Recognition (FAR) plays a crucial role in applications such as person re-identification, face retrieval, and face editing. Conventional multi-task attribute recognition methods often process the entire feature map for feature extraction and attribute classification, which can produce redundant features due to reliance on global regions. To address these challenges, we propose a novel approach emphasizing the selection of specific feature regions for efficient feature learning. We introduce the Mask-Guided Multi-Task Network (MGMTN), which integrates Adaptive Mask Learning (AML) and Group-Global Feature Fusion (G2FF) to address the aforementioned limitations. Leveraging a pre-trained keypoint annotation model and a fully convolutional network, AML accurately localizes critical facial parts (e.g., eye and mouth groups) and generates group masks that delineate meaningful feature regions, thereby mitigating negative transfer from global region usage. Furthermore, G2FF combines group and global features to enhance FAR learning, enabling more precise attribute identification. Extensive experiments on two challenging facial attribute recognition datasets demonstrate the effectiveness of MGMTN in improving FAR performance.

Related papers

FAR-AMTN: Attention Multi-Task Network for Face Attribute Recognition [13.392837372242907]
This study introduces FAR-AMTN, a novel Attention Multi-Task Network for Face Attribute Recognition (FAR)<n>It incorporates a Weight-Shared Group-Specific Attention (WSGSA) module with shared parameters to minimize complexity while improving group feature representation.<n>Experiments on the CelebA and LFWA datasets demonstrate that the proposed FAR-AMTN demonstrates superior accuracy with significantly fewer parameters compared to existing models.
arXiv Detail & Related papers (2026-01-04T14:20:16Z)
PSVMA+: Exploring Multi-granularity Semantic-visual Adaption for Generalized Zero-shot Learning [116.33775552866476]
Generalized zero-shot learning (GZSL) endeavors to identify the unseen using knowledge from the seen domain. GZSL suffers from insufficient visual-semantic correspondences due to attribute diversity and instance diversity. We propose a multi-granularity progressive semantic-visual adaption network, where sufficient visual elements can be gathered to remedy the inconsistency.
arXiv Detail & Related papers (2024-10-15T12:49:33Z)
Domain Consistency Representation Learning for Lifelong Person Re-Identification [31.076769754593098]
Lifelong person re-identification (LReID) exhibits a contradictory relationship between intra-domain discrimination and inter-domain gaps when learning from continuous data.<n>We propose a novel domain consistency representation learning (DCR) model that explores global and attribute-wise representations to balance intra-domain discrimination and inter-domain gaps.<n>Our DCR achieves superior performance compared to state-of-the-art LReID methods.
arXiv Detail & Related papers (2024-09-30T05:19:09Z)
Other Tokens Matter: Exploring Global and Local Features of Vision Transformers for Object Re-Identification [63.147482497821166]
We first explore the influence of global and local features of ViT and then propose a novel Global-Local Transformer (GLTrans) for high-performance object Re-ID. Our proposed method achieves superior performance on four object Re-ID benchmarks.
arXiv Detail & Related papers (2024-04-23T12:42:07Z)
TransFA: Transformer-based Representation for Face Attribute Evaluation [87.09529826340304]
We propose a novel textbftransformer-based representation for textbfattribute evaluation method (textbfTransFA) The proposed TransFA achieves superior performances compared with state-of-the-art methods.
arXiv Detail & Related papers (2022-07-12T10:58:06Z)
MGRR-Net: Multi-level Graph Relational Reasoning Network for Facial Action Units Detection [16.261362598190807]
The Facial Action Coding System (FACS) encodes the action units (AUs) in facial images. We argue that encoding AU features just from one perspective may not capture the rich contextual information between regional and global face features. We propose a novel Multi-level Graph Reasoning Network (termed MGRR-Net) for facial AU detection.
arXiv Detail & Related papers (2022-04-04T09:47:22Z)
Domain Private and Agnostic Feature for Modality Adaptive Face Recognition [10.497190559654245]
This paper proposes a Feature Aggregation Network (FAN), which includes disentangled representation module (DRM), feature fusion module (FFM) and metric penalty learning session. First, in DRM, twoworks, i.e. domain-private network and domain-agnostic network are specially designed for learning modality features and identity features. Second, in FFM, the identity features are fused with domain features to achieve cross-modal bi-directional identity feature transformation. Third, considering that the distribution imbalance between easy and hard pairs exists in cross-modal datasets, the identity preserving guided metric learning with adaptive
arXiv Detail & Related papers (2020-08-10T00:59:42Z)
Global Context-Aware Progressive Aggregation Network for Salient Object Detection [117.943116761278]
We propose a novel network named GCPANet to integrate low-level appearance features, high-level semantic features, and global context features. We show that the proposed approach outperforms the state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2020-03-02T04:26:10Z)
DotFAN: A Domain-transferred Face Augmentation Network for Pose and Illumination Invariant Face Recognition [94.96686189033869]
We propose a 3D model-assisted domain-transferred face augmentation network (DotFAN) DotFAN can generate a series of variants of an input face based on the knowledge distilled from existing rich face datasets collected from other domains. Experiments show that DotFAN is beneficial for augmenting small face datasets to improve their within-class diversity.
arXiv Detail & Related papers (2020-02-23T08:16:34Z)
Deep Multi-task Multi-label CNN for Effective Facial Attribute Classification [53.58763562421771]
We propose a novel deep multi-task multi-label CNN, termed DMM-CNN, for effective Facial Attribute Classification (FAC) Specifically, DMM-CNN jointly optimize two closely-related tasks (i.e., facial landmark detection and FAC) to improve the performance of FAC by taking advantage of multi-task learning. Two different network architectures are respectively designed to extract features for two groups of attributes, and a novel dynamic weighting scheme is proposed to automatically assign the loss weight to each facial attribute during training.
arXiv Detail & Related papers (2020-02-10T12:34:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.