Related papers: Incremental Image Labeling via Iterative Refinement

Related papers

Hierarchical Semantic Alignment for Image Clustering [59.277605709780524]
We propose a hierarChical semAntic alignmEnt method for image clustering, dubbed CAE, which improves cluster- ing performance in a training-free manner.<n>We first select relevant nouns from WordNet and descriptions from caption datasets to construct a semantic space aligned with image features.<n>Then, we align image features with selected nouns and captions via optimal transport to obtain a more discriminative semantic space.
arXiv Detail & Related papers (2025-11-30T14:14:51Z)
Semantic Prioritization in Visual Counterfactual Explanations with Weighted Segmentation and Auto-Adaptive Region Selection [50.68751788132789]
This study introduces an innovative methodology named as Weighted Semantic Map with Auto-adaptive Candidate Editing Network (WSAE-Net)<n>The generation of the weighted semantic map is designed to maximize the reduction of non-semantic feature units that need to be computed.<n>The auto-adaptive candidate editing sequences are designed to determine the optimal computational order among the feature units to be processed.
arXiv Detail & Related papers (2025-11-17T05:34:10Z)
Semantic-Aware Representation Learning via Conditional Transport for Multi-Label Image Classification [8.864897133482907]
This paper proposes a novel approach named Semantic-aware representation learning via Conditional Transport for Multi-Label Image Classification (SCT)<n>The proposed method introduces a semantic-related feature learning module that extracts discriminative label-specific features by emphasizing semantic relevance and interaction.<n>Experiments on two widely-used benchmark datasets, VOC2007 and MS-COCO, validate the effectiveness of SCT and demonstrate its superior performance compared to existing state-of-the-art methods.
arXiv Detail & Related papers (2025-07-20T11:15:24Z)
Context-Based Semantic-Aware Alignment for Semi-Supervised Multi-Label Learning [37.13424985128905]
Vision-language models pre-trained on large-scale image-text pairs could alleviate the challenge of limited labeled data under SSMLL setting. We propose a context-based semantic-aware alignment method to solve the SSMLL problem.
arXiv Detail & Related papers (2024-12-25T09:06:54Z)
Learning Semantic-Aware Representation in Visual-Language Models for Multi-Label Recognition with Partial Labels [19.740929527669483]
Multi-label recognition with partial labels (MLR-PL) is a practical task in computer vision. We introduce a semantic decoupling module and a category-specific prompt optimization method in CLIP-based framework. Our method effectively separates information from different categories and achieves better performance compared to CLIP-based baseline method.
arXiv Detail & Related papers (2024-12-14T14:31:36Z)
Label-template based Few-Shot Text Classification with Contrastive Learning [7.964862748983985]
We propose a simple and effective few-shot text classification framework. Label templates are embedded into input sentences to fully utilize the potential value of class labels. supervised contrastive learning is utilized to model the interaction information between support samples and query samples.
arXiv Detail & Related papers (2024-12-13T12:51:50Z)
HMGIE: Hierarchical and Multi-Grained Inconsistency Evaluation for Vision-Language Data Cleansing [54.970275599061594]
We design an adaptive evaluation framework, called Hierarchical and Multi-Grained Inconsistency Evaluation (HMGIE) HMGIE can provide multi-grained evaluations covering both accuracy and completeness for various image-caption pairs. To verify the efficacy and flexibility of the proposed framework, we construct MVTID, an image-caption dataset with diverse types and granularities of inconsistencies.
arXiv Detail & Related papers (2024-12-07T15:47:49Z)
Teaching VLMs to Localize Specific Objects from In-context Examples [56.797110842152]
We find that present-day Vision-Language Models (VLMs) lack a fundamental cognitive ability: learning to localize specific objects in a scene by taking into account the context. This work is the first to explore and benchmark personalized few-shot localization for VLMs.
arXiv Detail & Related papers (2024-11-20T13:34:22Z)
Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach [56.55633052479446]
Web-scale visual entity recognition presents significant challenges due to the lack of clean, large-scale training data. We propose a novel methodology to curate such a dataset, leveraging a multimodal large language model (LLM) for label verification, metadata generation, and rationale explanation. Experiments demonstrate that models trained on this automatically curated data achieve state-of-the-art performance on web-scale visual entity recognition tasks.
arXiv Detail & Related papers (2024-10-31T06:55:24Z)
Data-free Multi-label Image Recognition via LLM-powered Prompt Tuning [23.671999163027284]
This paper proposes a novel framework for multi-label image recognition without any training data. It uses knowledge of pre-trained Large Language Model to learn prompts to adapt pretrained Vision-Language Model like CLIP to multilabel classification. Our framework presents a new way to explore the synergies between multiple pre-trained models for novel category recognition.
arXiv Detail & Related papers (2024-03-02T13:43:32Z)
Fine-grained Recognition with Learnable Semantic Data Augmentation [68.48892326854494]
Fine-grained image recognition is a longstanding computer vision challenge. We propose diversifying the training data at the feature-level to alleviate the discriminative region loss problem. Our method significantly improves the generalization performance on several popular classification networks.
arXiv Detail & Related papers (2023-09-01T11:15:50Z)
A semantics-driven methodology for high-quality image annotation [4.7590051176368915]
We propose vTelos, an integrated Natural Language Processing, Knowledge Representation, and Computer Vision methodology. Key element of vTelos is the exploitation of the WordNet lexico-semantic hierarchy as the main means for providing the meaning of natural language labels. The methodology is validated on images populating a subset of the ImageNet hierarchy.
arXiv Detail & Related papers (2023-07-26T11:38:45Z)
Semantic Contrastive Bootstrapping for Single-positive Multi-label Recognition [36.3636416735057]
We present a semantic contrastive bootstrapping (Scob) approach to gradually recover the cross-object relationships. We then propose a recurrent semantic masked transformer to extract iconic object-level representations. Extensive experimental results demonstrate that the proposed joint learning framework surpasses the state-of-the-art models.
arXiv Detail & Related papers (2023-07-15T01:59:53Z)
Domain Adaptive Multiple Instance Learning for Instance-level Prediction of Pathological Images [45.132775668689604]
We propose a new task setting to improve the classification performance of the target dataset without increasing annotation costs. In order to combine the supervisory information of both methods effectively, we propose a method to create pseudo-labels with high confidence.
arXiv Detail & Related papers (2023-04-07T08:31:06Z)
Exploring Structured Semantic Prior for Multi Label Recognition with Incomplete Labels [60.675714333081466]
Multi-label recognition (MLR) with incomplete labels is very challenging. Recent works strive to explore the image-to-label correspondence in the vision-language model, ie, CLIP, to compensate for insufficient annotations. We advocate remedying the deficiency of label supervision for the MLR with incomplete labels by deriving a structured semantic prior.
arXiv Detail & Related papers (2023-03-23T12:39:20Z)
Semantic Representation and Dependency Learning for Multi-Label Image Recognition [76.52120002993728]
We propose a novel and effective semantic representation and dependency learning (SRDL) framework to learn category-specific semantic representation for each category. Specifically, we design a category-specific attentional regions (CAR) module to generate channel/spatial-wise attention matrices to guide model. We also design an object erasing (OE) module to implicitly learn semantic dependency among categories by erasing semantic-aware regions.
arXiv Detail & Related papers (2022-04-08T00:55:15Z)
On Guiding Visual Attention with Language Specification [76.08326100891571]
We use high-level language specification as advice for constraining the classification evidence to task-relevant features, instead of distractors. We show that supervising spatial attention in this way improves performance on classification tasks with biased and noisy data.
arXiv Detail & Related papers (2022-02-17T22:40:19Z)
Multitask Learning for Class-Imbalanced Discourse Classification [74.41900374452472]
We show that a multitask approach can improve 7% Micro F1-score upon current state-of-the-art benchmarks. We also offer a comparative review of additional techniques proposed to address resource-poor problems in NLP.
arXiv Detail & Related papers (2021-01-02T07:13:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.