Related papers: The Demon is in Ambiguity: Revisiting Situation Recognition with Single Positive Multi-Label Learning

The Demon is in Ambiguity: Revisiting Situation Recognition with Single Positive Multi-Label Learning

URL: http://arxiv.org/abs/2508.21816v1
Date: Fri, 29 Aug 2025 17:51:55 GMT
Title: The Demon is in Ambiguity: Revisiting Situation Recognition with Single Positive Multi-Label Learning
Authors: Yiming Lin, Yuchen Niu, Shang Wang, Kaizhu Huang, Qiufeng Wang, Xiao-Bo Jin,
Abstract summary: Context recognition is a fundamental task in computer vision that aims to extract structured semantic summaries from images.<n>Existing methods treat verb classification as a single-label problem, but we show through a comprehensive analysis that this formulation fails to address the inherent ambiguity in visual event recognition.<n>This paper makes three key contributions: First, we reveal through empirical analysis that verb classification is inherently a multi-label problem due to the ubiquitous semantic overlap between verb categories.<n>Second, given the impracticality of fully annotating large-scale datasets with multiple labels, we propose to reformulate verb classification as a single positive multi-label learning
Score: 30.485929387603463
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Context recognition (SR) is a fundamental task in computer vision that aims to extract structured semantic summaries from images by identifying key events and their associated entities. Specifically, given an input image, the model must first classify the main visual events (verb classification), then identify the participating entities and their semantic roles (semantic role labeling), and finally localize these entities in the image (semantic role localization). Existing methods treat verb classification as a single-label problem, but we show through a comprehensive analysis that this formulation fails to address the inherent ambiguity in visual event recognition, as multiple verb categories may reasonably describe the same image. This paper makes three key contributions: First, we reveal through empirical analysis that verb classification is inherently a multi-label problem due to the ubiquitous semantic overlap between verb categories. Second, given the impracticality of fully annotating large-scale datasets with multiple labels, we propose to reformulate verb classification as a single positive multi-label learning (SPMLL) problem - a novel perspective in SR research. Third, we design a comprehensive multi-label evaluation benchmark for SR that is carefully designed to fairly evaluate model performance in a multi-label setting. To address the challenges of SPMLL, we futher develop the Graph Enhanced Verb Multilayer Perceptron (GE-VerbMLP), which combines graph neural networks to capture label correlations and adversarial training to optimize decision boundaries. Extensive experiments on real-world datasets show that our approach achieves more than 3\% MAP improvement while remaining competitive on traditional top-1 and top-5 accuracy metrics.

Related papers

Semantic-Aware Representation Learning for Multi-label Image Classification [6.444512435220748]
This paper proposes a Semantic-Aware Representation Learning (SARL) for multi-label image classification.<n>First, a label semantic-related feature learning module is utilized to extract semantic-related features.<n>Second, an optimal transport-based attention mechanism is designed to obtain semantically aligned image representation.
arXiv Detail & Related papers (2025-07-20T11:15:24Z)
Semantic-guided Representation Learning for Multi-Label Recognition [13.046479112800608]
Multi-label Recognition (MLR) involves assigning multiple labels to each data instance in an image.<n>Recent Vision and Language Pre-training methods have made significant progress in tackling zero-shot MLR tasks.<n>We introduce a Semantic-guided Representation Learning approach (SigRL) that enables the model to learn effective visual and textual representations.
arXiv Detail & Related papers (2025-04-04T08:15:08Z)
A Unified Label-Aware Contrastive Learning Framework for Few-Shot Named Entity Recognition [6.468625143772815]
We propose a unified label-aware token-level contrastive learning framework. Our approach enriches the context by utilizing label semantics as suffix prompts. It simultaneously optimize context-native and context-label contrastive learning objectives.
arXiv Detail & Related papers (2024-04-26T06:19:21Z)
Semantic Contrastive Bootstrapping for Single-positive Multi-label Recognition [36.3636416735057]
We present a semantic contrastive bootstrapping (Scob) approach to gradually recover the cross-object relationships. We then propose a recurrent semantic masked transformer to extract iconic object-level representations. Extensive experimental results demonstrate that the proposed joint learning framework surpasses the state-of-the-art models.
arXiv Detail & Related papers (2023-07-15T01:59:53Z)
Exploring Structured Semantic Prior for Multi Label Recognition with Incomplete Labels [60.675714333081466]
Multi-label recognition (MLR) with incomplete labels is very challenging. Recent works strive to explore the image-to-label correspondence in the vision-language model, ie, CLIP, to compensate for insufficient annotations. We advocate remedying the deficiency of label supervision for the MLR with incomplete labels by deriving a structured semantic prior.
arXiv Detail & Related papers (2023-03-23T12:39:20Z)
Triple-View Feature Learning for Medical Image Segmentation [9.992387025633805]
TriSegNet is a semi-supervised semantic segmentation framework. It uses triple-view feature learning on a limited amount of labelled data and a large amount of unlabeled data.
arXiv Detail & Related papers (2022-08-12T14:41:40Z)
Semantic Representation and Dependency Learning for Multi-Label Image Recognition [76.52120002993728]
We propose a novel and effective semantic representation and dependency learning (SRDL) framework to learn category-specific semantic representation for each category. Specifically, we design a category-specific attentional regions (CAR) module to generate channel/spatial-wise attention matrices to guide model. We also design an object erasing (OE) module to implicitly learn semantic dependency among categories by erasing semantic-aware regions.
arXiv Detail & Related papers (2022-04-08T00:55:15Z)
Learning Self-Supervised Low-Rank Network for Single-Stage Weakly and Semi-Supervised Semantic Segmentation [119.009033745244]
This paper presents a Self-supervised Low-Rank Network ( SLRNet) for single-stage weakly supervised semantic segmentation (WSSS) and semi-supervised semantic segmentation (SSSS) SLRNet uses cross-view self-supervision, that is, it simultaneously predicts several attentive LR representations from different views of an image to learn precise pseudo-labels. Experiments on the Pascal VOC 2012, COCO, and L2ID datasets demonstrate that our SLRNet outperforms both state-of-the-art WSSS and SSSS methods with a variety of different settings.
arXiv Detail & Related papers (2022-03-19T09:19:55Z)
Knowledge-Guided Multi-Label Few-Shot Learning for General Image Recognition [75.44233392355711]
KGGR framework exploits prior knowledge of statistical label correlations with deep neural networks. It first builds a structured knowledge graph to correlate different labels based on statistical label co-occurrence. Then, it introduces the label semantics to guide learning semantic-specific features. It exploits a graph propagation network to explore graph node interactions.
arXiv Detail & Related papers (2020-09-20T15:05:29Z)
Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation [128.03739769844736]
Two neural co-attentions are incorporated into the classifier to capture cross-image semantic similarities and differences. In addition to boosting object pattern learning, the co-attention can leverage context from other related images to improve localization map inference. Our algorithm sets new state-of-the-arts on all these settings, demonstrating well its efficacy and generalizability.
arXiv Detail & Related papers (2020-07-03T21:53:46Z)
Unsupervised Person Re-identification via Multi-label Classification [55.65870468861157]
This paper formulates unsupervised person ReID as a multi-label classification task to progressively seek true labels. Our method starts by assigning each person image with a single-class label, then evolves to multi-label classification by leveraging the updated ReID model for label prediction. To boost the ReID model training efficiency in multi-label classification, we propose the memory-based multi-label classification loss (MMCL)
arXiv Detail & Related papers (2020-04-20T12:13:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.