Incremental Image Labeling via Iterative Refinement
- URL: http://arxiv.org/abs/2304.08989v1
- Date: Tue, 18 Apr 2023 13:37:22 GMT
- Title: Incremental Image Labeling via Iterative Refinement
- Authors: Fausto Giunchiglia, Xiaolei Diao, Mayukh Bagchi
- Abstract summary: In particular, the existence of the semantic gap problem leads to a many-to-many mapping between the information extracted from an image and its linguistic description.
This unavoidable bias further leads to poor performance on current computer vision tasks.
We introduce a Knowledge Representation (KR)-based methodology to provide guidelines driving the labeling process.
- Score: 4.7590051176368915
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data quality is critical for multimedia tasks, while various types of
systematic flaws are found in image benchmark datasets, as discussed in recent
work. In particular, the existence of the semantic gap problem leads to a
many-to-many mapping between the information extracted from an image and its
linguistic description. This unavoidable bias further leads to poor performance
on current computer vision tasks. To address this issue, we introduce a
Knowledge Representation (KR)-based methodology to provide guidelines driving
the labeling process, thereby indirectly introducing intended semantics in ML
models. Specifically, an iterative refinement-based annotation method is
proposed to optimize data labeling by organizing objects in a classification
hierarchy according to their visual properties, ensuring that they are aligned
with their linguistic descriptions. Preliminary results verify the
effectiveness of the proposed method.
Related papers
- Context-Based Semantic-Aware Alignment for Semi-Supervised Multi-Label Learning [37.13424985128905]
Vision-language models pre-trained on large-scale image-text pairs could alleviate the challenge of limited labeled data under SSMLL setting.
We propose a context-based semantic-aware alignment method to solve the SSMLL problem.
arXiv Detail & Related papers (2024-12-25T09:06:54Z) - Learning Semantic-Aware Representation in Visual-Language Models for Multi-Label Recognition with Partial Labels [19.740929527669483]
Multi-label recognition with partial labels (MLR-PL) is a practical task in computer vision.
We introduce a semantic decoupling module and a category-specific prompt optimization method in CLIP-based framework.
Our method effectively separates information from different categories and achieves better performance compared to CLIP-based baseline method.
arXiv Detail & Related papers (2024-12-14T14:31:36Z) - Label-template based Few-Shot Text Classification with Contrastive Learning [7.964862748983985]
We propose a simple and effective few-shot text classification framework.
Label templates are embedded into input sentences to fully utilize the potential value of class labels.
supervised contrastive learning is utilized to model the interaction information between support samples and query samples.
arXiv Detail & Related papers (2024-12-13T12:51:50Z) - HMGIE: Hierarchical and Multi-Grained Inconsistency Evaluation for Vision-Language Data Cleansing [54.970275599061594]
We design an adaptive evaluation framework, called Hierarchical and Multi-Grained Inconsistency Evaluation (HMGIE)
HMGIE can provide multi-grained evaluations covering both accuracy and completeness for various image-caption pairs.
To verify the efficacy and flexibility of the proposed framework, we construct MVTID, an image-caption dataset with diverse types and granularities of inconsistencies.
arXiv Detail & Related papers (2024-12-07T15:47:49Z) - Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach [56.55633052479446]
Web-scale visual entity recognition presents significant challenges due to the lack of clean, large-scale training data.
We propose a novel methodology to curate such a dataset, leveraging a multimodal large language model (LLM) for label verification, metadata generation, and rationale explanation.
Experiments demonstrate that models trained on this automatically curated data achieve state-of-the-art performance on web-scale visual entity recognition tasks.
arXiv Detail & Related papers (2024-10-31T06:55:24Z) - Data-free Multi-label Image Recognition via LLM-powered Prompt Tuning [23.671999163027284]
This paper proposes a novel framework for multi-label image recognition without any training data.
It uses knowledge of pre-trained Large Language Model to learn prompts to adapt pretrained Vision-Language Model like CLIP to multilabel classification.
Our framework presents a new way to explore the synergies between multiple pre-trained models for novel category recognition.
arXiv Detail & Related papers (2024-03-02T13:43:32Z) - A semantics-driven methodology for high-quality image annotation [4.7590051176368915]
We propose vTelos, an integrated Natural Language Processing, Knowledge Representation, and Computer Vision methodology.
Key element of vTelos is the exploitation of the WordNet lexico-semantic hierarchy as the main means for providing the meaning of natural language labels.
The methodology is validated on images populating a subset of the ImageNet hierarchy.
arXiv Detail & Related papers (2023-07-26T11:38:45Z) - Exploring Structured Semantic Prior for Multi Label Recognition with
Incomplete Labels [60.675714333081466]
Multi-label recognition (MLR) with incomplete labels is very challenging.
Recent works strive to explore the image-to-label correspondence in the vision-language model, ie, CLIP, to compensate for insufficient annotations.
We advocate remedying the deficiency of label supervision for the MLR with incomplete labels by deriving a structured semantic prior.
arXiv Detail & Related papers (2023-03-23T12:39:20Z) - Semantic Representation and Dependency Learning for Multi-Label Image
Recognition [76.52120002993728]
We propose a novel and effective semantic representation and dependency learning (SRDL) framework to learn category-specific semantic representation for each category.
Specifically, we design a category-specific attentional regions (CAR) module to generate channel/spatial-wise attention matrices to guide model.
We also design an object erasing (OE) module to implicitly learn semantic dependency among categories by erasing semantic-aware regions.
arXiv Detail & Related papers (2022-04-08T00:55:15Z) - On Guiding Visual Attention with Language Specification [76.08326100891571]
We use high-level language specification as advice for constraining the classification evidence to task-relevant features, instead of distractors.
We show that supervising spatial attention in this way improves performance on classification tasks with biased and noisy data.
arXiv Detail & Related papers (2022-02-17T22:40:19Z) - Multitask Learning for Class-Imbalanced Discourse Classification [74.41900374452472]
We show that a multitask approach can improve 7% Micro F1-score upon current state-of-the-art benchmarks.
We also offer a comparative review of additional techniques proposed to address resource-poor problems in NLP.
arXiv Detail & Related papers (2021-01-02T07:13:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.