Related papers: A Quantitative Evaluation of the Expressivity of BMI, Pose and Gender in Body Embeddings for Recognition and Identification

A Quantitative Evaluation of the Expressivity of BMI, Pose and Gender in Body Embeddings for Recognition and Identification

URL: http://arxiv.org/abs/2503.06451v3
Date: Wed, 07 May 2025 20:16:39 GMT
Title: A Quantitative Evaluation of the Expressivity of BMI, Pose and Gender in Body Embeddings for Recognition and Identification
Authors: Basudha Pal, Siyuan Huang, Rama Chellappa,
Abstract summary: We extend the notion of expressivity, defined as the mutual information between learned features and specific attributes, to quantify how strongly attributes are encoded.<n>We find that BMI consistently shows the highest expressivity in the final layers, indicating its dominant role in recognition.<n>These findings demonstrate the central role of body attributes in ReID and establish a principled approach for uncovering attribute driven correlations.
Score: 56.10719736365069
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Person Re-identification (ReID) systems that match individuals across images or video frames are essential in many real-world applications. However, existing methods are often influenced by attributes such as gender, pose, and body mass index (BMI), which vary in unconstrained settings and raise concerns related to fairness and generalization. To address this, we extend the notion of expressivity, defined as the mutual information between learned features and specific attributes, using a secondary neural network to quantify how strongly attributes are encoded. Applying this framework to three ReID models, we find that BMI consistently shows the highest expressivity in the final layers, indicating its dominant role in recognition. In the last attention layer, attributes are ranked as BMI > Pitch > Gender > Yaw, revealing their relative influences in representation learning. Expressivity values also evolve across layers and training epochs, reflecting a dynamic encoding of attributes. These findings demonstrate the central role of body attributes in ReID and establish a principled approach for uncovering attribute driven correlations.

Related papers

Attributes Shape the Embedding Space of Face Recognition Models [0.0]
Face Recognition tasks have made significant progress with the advent of Deep Neural Networks.<n>We observe a multiscale geometric structure emerging in the embedding space.<n>We propose a geometric approach to describe the dependence or invariance of FR models to these attributes.
arXiv Detail & Related papers (2025-07-15T14:44:39Z)
Adaptive Prototype Model for Attribute-based Multi-label Few-shot Action Recognition [11.316708754749103]
In real-world action recognition systems, incorporating more attributes helps achieve a more comprehensive understanding of human behavior. We propose a novel method i.e. Adaptive Attribute Prototype Model (AAPM) for human action recognition, which captures rich action-relevant attribute information. Our AAPM achieves the state-of-the-art performance in both attribute-based multi-label few-shot action recognition and single-label few-shot action recognition.
arXiv Detail & Related papers (2025-02-18T06:39:28Z)
Hybrid Discriminative Attribute-Object Embedding Network for Compositional Zero-Shot Learning [83.10178754323955]
Hybrid Discriminative Attribute-Object Embedding (HDA-OE) network is proposed to solve the problem of complex interactions between attributes and object visual representations.<n>To increase the variability of training data, HDA-OE introduces an attribute-driven data synthesis (ADDS) module.<n>To further improve the discriminative ability of the model, HDA-OE introduces the subclass-driven discriminative embedding (SDDE) module.<n>The proposed model has been evaluated on three benchmark datasets, and the results verify its effectiveness and reliability.
arXiv Detail & Related papers (2024-11-28T09:50:25Z)
ArtVLM: Attribute Recognition Through Vision-Based Prefix Language Modeling [32.55352435358949]
We propose a sentence generation-based retrieval formulation for attribute recognition. For each attribute to be recognized on an image, we measure the visual-conditioned probability of generating a short sentence. We demonstrate through experiments that generative retrieval consistently outperforms contrastive retrieval on two visual reasoning datasets.
arXiv Detail & Related papers (2024-08-07T21:44:29Z)
Dual Relation Mining Network for Zero-Shot Learning [48.89161627050706]
We propose a Dual Relation Mining Network (DRMN) to enable effective visual-semantic interactions and learn semantic relationship among attributes for knowledge transfer. Specifically, we introduce a Dual Attention Block (DAB) for visual-semantic relationship mining, which enriches visual information by multi-level feature fusion. For semantic relationship modeling, we utilize a Semantic Interaction Transformer (SIT) to enhance the generalization of attribute representations among images.
arXiv Detail & Related papers (2024-05-06T16:31:19Z)
Leveraging vision-language models for fair facial attribute classification [19.93324644519412]
General-purpose vision-language model (VLM) is a rich knowledge source for common sensitive attributes. We analyze the correspondence between VLM predicted and human defined sensitive attribute distribution. Experiments on multiple benchmark facial attribute classification datasets show fairness gains of the model over existing unsupervised baselines.
arXiv Detail & Related papers (2024-03-15T18:37:15Z)
Exploring Fine-Grained Representation and Recomposition for Cloth-Changing Person Re-Identification [78.52704557647438]
We propose a novel FIne-grained Representation and Recomposition (FIRe$2$) framework to tackle both limitations without any auxiliary annotation or data. Experiments demonstrate that FIRe$2$ can achieve state-of-the-art performance on five widely-used cloth-changing person Re-ID benchmarks.
arXiv Detail & Related papers (2023-08-21T12:59:48Z)
Hierarchical Visual Primitive Experts for Compositional Zero-Shot Learning [52.506434446439776]
Compositional zero-shot learning (CZSL) aims to recognize compositions with prior knowledge of known primitives (attribute and object) We propose a simple and scalable framework called Composition Transformer (CoT) to address these issues. Our method achieves SoTA performance on several benchmarks, including MIT-States, C-GQA, and VAW-CZSL.
arXiv Detail & Related papers (2023-08-08T03:24:21Z)
A Solution to Co-occurrence Bias: Attributes Disentanglement via Mutual Information Minimization for Pedestrian Attribute Recognition [10.821982414387525]
We show that current methods can actually suffer in generalizing such fitted attributes interdependencies onto scenes or identities off the dataset distribution. To render models robust in realistic scenes, we propose the attributes-disentangled feature learning to ensure the recognition of an attribute not inferring on the existence of others.
arXiv Detail & Related papers (2023-07-28T01:34:55Z)
TransFA: Transformer-based Representation for Face Attribute Evaluation [87.09529826340304]
We propose a novel textbftransformer-based representation for textbfattribute evaluation method (textbfTransFA) The proposed TransFA achieves superior performances compared with state-of-the-art methods.
arXiv Detail & Related papers (2022-07-12T10:58:06Z)
Attribute Prototype Network for Any-Shot Learning [113.50220968583353]
We argue that an image representation with integrated attribute localization ability would be beneficial for any-shot, i.e. zero-shot and few-shot, image classification tasks. We propose a novel representation learning framework that jointly learns global and local features using only class-level attributes.
arXiv Detail & Related papers (2022-04-04T02:25:40Z)
Deep Collaborative Multi-Modal Learning for Unsupervised Kinship Estimation [53.62256887837659]
Kinship verification is a long-standing research challenge in computer vision. We propose a novel deep collaborative multi-modal learning (DCML) to integrate the underlying information presented in facial properties. Our DCML method is always superior to some state-of-the-art kinship verification methods.
arXiv Detail & Related papers (2021-09-07T01:34:51Z)
A-FMI: Learning Attributions from Deep Networks via Feature Map Importance [58.708607977437794]
Gradient-based attribution methods can aid in the understanding of convolutional neural networks (CNNs) The redundancy of attribution features and the gradient saturation problem are challenges that attribution methods still face. We propose a new concept, feature map importance (FMI), to refine the contribution of each feature map, and a novel attribution method via FMI, to address the gradient saturation problem.
arXiv Detail & Related papers (2021-04-12T14:54:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.