Incremental Image Labeling via Iterative Refinement
- URL: http://arxiv.org/abs/2304.08989v1
- Date: Tue, 18 Apr 2023 13:37:22 GMT
- Title: Incremental Image Labeling via Iterative Refinement
- Authors: Fausto Giunchiglia, Xiaolei Diao, Mayukh Bagchi
- Abstract summary: In particular, the existence of the semantic gap problem leads to a many-to-many mapping between the information extracted from an image and its linguistic description.
This unavoidable bias further leads to poor performance on current computer vision tasks.
We introduce a Knowledge Representation (KR)-based methodology to provide guidelines driving the labeling process.
- Score: 4.7590051176368915
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data quality is critical for multimedia tasks, while various types of
systematic flaws are found in image benchmark datasets, as discussed in recent
work. In particular, the existence of the semantic gap problem leads to a
many-to-many mapping between the information extracted from an image and its
linguistic description. This unavoidable bias further leads to poor performance
on current computer vision tasks. To address this issue, we introduce a
Knowledge Representation (KR)-based methodology to provide guidelines driving
the labeling process, thereby indirectly introducing intended semantics in ML
models. Specifically, an iterative refinement-based annotation method is
proposed to optimize data labeling by organizing objects in a classification
hierarchy according to their visual properties, ensuring that they are aligned
with their linguistic descriptions. Preliminary results verify the
effectiveness of the proposed method.
Related papers
- Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach [56.55633052479446]
Web-scale visual entity recognition presents significant challenges due to the lack of clean, large-scale training data.
We propose a novel methodology to curate such a dataset, leveraging a multimodal large language model (LLM) for label verification, metadata generation, and rationale explanation.
Experiments demonstrate that models trained on this automatically curated data achieve state-of-the-art performance on web-scale visual entity recognition tasks.
arXiv Detail & Related papers (2024-10-31T06:55:24Z) - Data-free Multi-label Image Recognition via LLM-powered Prompt Tuning [23.671999163027284]
This paper proposes a novel framework for multi-label image recognition without any training data.
It uses knowledge of pre-trained Large Language Model to learn prompts to adapt pretrained Vision-Language Model like CLIP to multilabel classification.
Our framework presents a new way to explore the synergies between multiple pre-trained models for novel category recognition.
arXiv Detail & Related papers (2024-03-02T13:43:32Z) - Fine-grained Recognition with Learnable Semantic Data Augmentation [68.48892326854494]
Fine-grained image recognition is a longstanding computer vision challenge.
We propose diversifying the training data at the feature-level to alleviate the discriminative region loss problem.
Our method significantly improves the generalization performance on several popular classification networks.
arXiv Detail & Related papers (2023-09-01T11:15:50Z) - A semantics-driven methodology for high-quality image annotation [4.7590051176368915]
We propose vTelos, an integrated Natural Language Processing, Knowledge Representation, and Computer Vision methodology.
Key element of vTelos is the exploitation of the WordNet lexico-semantic hierarchy as the main means for providing the meaning of natural language labels.
The methodology is validated on images populating a subset of the ImageNet hierarchy.
arXiv Detail & Related papers (2023-07-26T11:38:45Z) - Semantic Contrastive Bootstrapping for Single-positive Multi-label
Recognition [36.3636416735057]
We present a semantic contrastive bootstrapping (Scob) approach to gradually recover the cross-object relationships.
We then propose a recurrent semantic masked transformer to extract iconic object-level representations.
Extensive experimental results demonstrate that the proposed joint learning framework surpasses the state-of-the-art models.
arXiv Detail & Related papers (2023-07-15T01:59:53Z) - Domain Adaptive Multiple Instance Learning for Instance-level Prediction
of Pathological Images [45.132775668689604]
We propose a new task setting to improve the classification performance of the target dataset without increasing annotation costs.
In order to combine the supervisory information of both methods effectively, we propose a method to create pseudo-labels with high confidence.
arXiv Detail & Related papers (2023-04-07T08:31:06Z) - Exploring Structured Semantic Prior for Multi Label Recognition with
Incomplete Labels [60.675714333081466]
Multi-label recognition (MLR) with incomplete labels is very challenging.
Recent works strive to explore the image-to-label correspondence in the vision-language model, ie, CLIP, to compensate for insufficient annotations.
We advocate remedying the deficiency of label supervision for the MLR with incomplete labels by deriving a structured semantic prior.
arXiv Detail & Related papers (2023-03-23T12:39:20Z) - Semantic Representation and Dependency Learning for Multi-Label Image
Recognition [76.52120002993728]
We propose a novel and effective semantic representation and dependency learning (SRDL) framework to learn category-specific semantic representation for each category.
Specifically, we design a category-specific attentional regions (CAR) module to generate channel/spatial-wise attention matrices to guide model.
We also design an object erasing (OE) module to implicitly learn semantic dependency among categories by erasing semantic-aware regions.
arXiv Detail & Related papers (2022-04-08T00:55:15Z) - On Guiding Visual Attention with Language Specification [76.08326100891571]
We use high-level language specification as advice for constraining the classification evidence to task-relevant features, instead of distractors.
We show that supervising spatial attention in this way improves performance on classification tasks with biased and noisy data.
arXiv Detail & Related papers (2022-02-17T22:40:19Z) - Multitask Learning for Class-Imbalanced Discourse Classification [74.41900374452472]
We show that a multitask approach can improve 7% Micro F1-score upon current state-of-the-art benchmarks.
We also offer a comparative review of additional techniques proposed to address resource-poor problems in NLP.
arXiv Detail & Related papers (2021-01-02T07:13:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.