Related papers: SemCovNet: Towards Fair and Semantic Coverage-Aware Learning for Underrepresented Visual Concepts

SemCovNet: Towards Fair and Semantic Coverage-Aware Learning for Underrepresented Visual Concepts

URL: http://arxiv.org/abs/2602.16917v1
Date: Wed, 18 Feb 2026 22:18:29 GMT
Title: SemCovNet: Towards Fair and Semantic Coverage-Aware Learning for Underrepresented Visual Concepts
Authors: Sakib Ahammed, Xia Cui, Xinqi Fan, Wenqi Lu, Moi Hoon Yap,
Abstract summary: Existing datasets exhibit Semantic Coverage Imbalance (SCI)<n>SCI occurs at the semantic level, affecting how models learn and reason about rare yet meaningful semantics.<n>We propose Semantic Coverage-Aware Network (SemCovNet), a novel model that explicitly learns to correct SCI.
Score: 11.181779608395184
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Modern vision models increasingly rely on rich semantic representations that extend beyond class labels to include descriptive concepts and contextual attributes. However, existing datasets exhibit Semantic Coverage Imbalance (SCI), a previously overlooked bias arising from the long-tailed semantic representations. Unlike class imbalance, SCI occurs at the semantic level, affecting how models learn and reason about rare yet meaningful semantics. To mitigate SCI, we propose Semantic Coverage-Aware Network (SemCovNet), a novel model that explicitly learns to correct semantic coverage disparities. SemCovNet integrates a Semantic Descriptor Map (SDM) for learning semantic representations, a Descriptor Attention Modulation (DAM) module that dynamically weights visual and concept features, and a Descriptor-Visual Alignment (DVA) loss that aligns visual features with descriptor semantics. We quantify semantic fairness using a Coverage Disparity Index (CDI), which measures the alignment between coverage and error. Extensive experiments across multiple datasets demonstrate that SemCovNet enhances model reliability and substantially reduces CDI, achieving fairer and more equitable performance. This work establishes SCI as a measurable and correctable bias, providing a foundation for advancing semantic fairness and interpretable vision learning.

Related papers

VL-SAE: Interpreting and Enhancing Vision-Language Alignment with a Unified Concept Set [80.50996301430108]
The alignment of vision-language representations endows current Vision-Language Models with strong multi-modal reasoning capabilities.<n>We propose VL-SAE, a sparse autoencoder that encodes vision-language representations into its hidden activations.<n>For interpretation, the alignment between vision and language representations can be understood by comparing their semantics with concepts.
arXiv Detail & Related papers (2025-10-24T10:29:31Z)
Improving vision-language alignment with graph spiking hybrid Networks [10.88584928028832]
This paper proposes a comprehensive visual semantic representation module, necessitating the utilization of panoptic segmentation to generate fine-grained semantic features.<n>We propose a novel Graph Spiking Hybrid Network (GSHN) that integrates the complementary advantages of Spiking Neural Networks (SNNs) and Graph Attention Networks (GATs) to encode visual semantic information.
arXiv Detail & Related papers (2025-01-31T11:55:17Z)
Simple Semantic-Aided Few-Shot Learning [2.8686437689115354]
Learning from a limited amount of data, namely Few-Shot Learning, stands out as a challenging computer vision task. We design an automatic way called Semantic Evolution to generate high-quality semantics. We employ a simple two-layer network termed Semantic Alignment Network to transform semantics and visual features into robust class prototypes.
arXiv Detail & Related papers (2023-11-30T15:57:34Z)
Progressive Semantic-Visual Mutual Adaption for Generalized Zero-Shot Learning [74.48337375174297]
Generalized Zero-Shot Learning (GZSL) identifies unseen categories by knowledge transferred from the seen domain. We deploy the dual semantic-visual transformer module (DSVTM) to progressively model the correspondences between prototypes and visual features. DSVTM devises an instance-motivated semantic encoder that learns instance-centric prototypes to adapt to different images, enabling the recast of the unmatched semantic-visual pair into the matched one.
arXiv Detail & Related papers (2023-03-27T15:21:43Z)
Imitation Learning-based Implicit Semantic-aware Communication Networks: Multi-layer Representation and Collaborative Reasoning [68.63380306259742]
Despite its promising potential, semantic communications and semantic-aware networking are still at their infancy. We propose a novel reasoning-based implicit semantic-aware communication network architecture that allows multiple tiers of CDC and edge servers to collaborate. We introduce a new multi-layer representation of semantic information taking into consideration both the hierarchical structure of implicit semantics as well as the personalized inference preference of individual users.
arXiv Detail & Related papers (2022-10-28T13:26:08Z)
GlanceNets: Interpretabile, Leak-proof Concept-based Models [23.7625973884849]
Concept-based models (CBMs) combine high-performance and interpretability by acquiring and reasoning with a vocabulary of high-level concepts. We provide a clear definition of interpretability in terms of alignment between the model's representation and an underlying data generation process. We introduce GlanceNets, a new CBM that exploits techniques from disentangled representation learning and open-set recognition to achieve alignment.
arXiv Detail & Related papers (2022-05-31T08:53:53Z)
Cross-modal Representation Learning for Zero-shot Action Recognition [67.57406812235767]
We present a cross-modal Transformer-based framework, which jointly encodes video data and text labels for zero-shot action recognition (ZSAR) Our model employs a conceptually new pipeline by which visual representations are learned in conjunction with visual-semantic associations in an end-to-end manner. Experiment results show our model considerably improves upon the state of the arts in ZSAR, reaching encouraging top-1 accuracy on UCF101, HMDB51, and ActivityNet benchmark datasets.
arXiv Detail & Related papers (2022-05-03T17:39:27Z)
VGSE: Visually-Grounded Semantic Embeddings for Zero-Shot Learning [113.50220968583353]
We propose to discover semantic embeddings containing discriminative visual properties for zero-shot learning. Our model visually divides a set of images from seen classes into clusters of local image regions according to their visual similarity. We demonstrate that our visually-grounded semantic embeddings further improve performance over word embeddings across various ZSL models by a large margin.
arXiv Detail & Related papers (2022-03-20T03:49:02Z)
Meta-Learning with Variational Semantic Memory for Word Sense Disambiguation [56.830395467247016]
We propose a model of semantic memory for WSD in a meta-learning setting. Our model is based on hierarchical variational inference and incorporates an adaptive memory update rule via a hypernetwork. We show our model advances the state of the art in few-shot WSD, supports effective learning in extremely data scarce scenarios.
arXiv Detail & Related papers (2021-06-05T20:40:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.