An Effective Automatic Image Annotation Model Via Attention Model and
Data Equilibrium
- URL: http://arxiv.org/abs/2001.10590v1
- Date: Sun, 26 Jan 2020 05:59:57 GMT
- Title: An Effective Automatic Image Annotation Model Via Attention Model and
Data Equilibrium
- Authors: Amir Vatani, Milad Taleby Ahvanooey, Mostafa Rahimi
- Abstract summary: The proposed model has three phases, including a feature extractor, a tag generator, and an image annotator.
The experiments conducted on two benchmark datasets confirm that the superiority of the proposed model compared to the previous models.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Nowadays, a huge number of images are available. However, retrieving a
required image for an ordinary user is a challenging task in computer vision
systems. During the past two decades, many types of research have been
introduced to improve the performance of the automatic annotation of images,
which are traditionally focused on content-based image retrieval. Although,
recent research demonstrates that there is a semantic gap between content-based
image retrieval and image semantics understandable by humans. As a result,
existing research in this area has caused to bridge the semantic gap between
low-level image features and high-level semantics. The conventional method of
bridging the semantic gap is through the automatic image annotation (AIA) that
extracts semantic features using machine learning techniques. In this paper, we
propose a novel AIA model based on the deep learning feature extraction method.
The proposed model has three phases, including a feature extractor, a tag
generator, and an image annotator. First, the proposed model extracts
automatically the high and low-level features based on dual-tree continues
wavelet transform (DT-CWT), singular value decomposition, distribution of color
ton, and the deep neural network. Moreover, the tag generator balances the
dictionary of the annotated keywords by a new log-entropy auto-encoder (LEAE)
and then describes these keywords by word embedding. Finally, the annotator
works based on the long-short-term memory (LSTM) network in order to obtain the
importance degree of specific features of the image. The experiments conducted
on two benchmark datasets confirm that the superiority of the proposed model
compared to the previous models in terms of performance criteria.
Related papers
- Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model [80.61157097223058]
A prevalent strategy to bolster image classification performance is through augmenting the training set with synthetic images generated by T2I models.
In this study, we scrutinize the shortcomings of both current generative and conventional data augmentation techniques.
We introduce an innovative inter-class data augmentation method known as Diff-Mix, which enriches the dataset by performing image translations between classes.
arXiv Detail & Related papers (2024-03-28T17:23:45Z) - A Generative Approach for Wikipedia-Scale Visual Entity Recognition [56.55633052479446]
We address the task of mapping a given query image to one of the 6 million existing entities in Wikipedia.
We introduce a novel Generative Entity Recognition framework, which learns to auto-regressively decode a semantic and discriminative code'' identifying the target entity.
arXiv Detail & Related papers (2024-03-04T13:47:30Z) - Unlocking Pre-trained Image Backbones for Semantic Image Synthesis [29.688029979801577]
We propose a new class of GAN discriminators for semantic image synthesis that generates highly realistic images.
Our model, which we dub DP-SIMS, achieves state-of-the-art results in terms of image quality and consistency with the input label maps on ADE-20K, COCO-Stuff, and Cityscapes.
arXiv Detail & Related papers (2023-12-20T09:39:19Z) - Diversified in-domain synthesis with efficient fine-tuning for few-shot
classification [64.86872227580866]
Few-shot image classification aims to learn an image classifier using only a small set of labeled examples per class.
We propose DISEF, a novel approach which addresses the generalization challenge in few-shot learning using synthetic data.
We validate our method in ten different benchmarks, consistently outperforming baselines and establishing a new state-of-the-art for few-shot classification.
arXiv Detail & Related papers (2023-12-05T17:18:09Z) - Reinforcement Learning from Diffusion Feedback: Q* for Image Search [2.5835347022640254]
We present two models for image generation using model-agnostic learning.
RLDF is a singular approach for visual imitation through prior-preserving reward function guidance.
It generates high-quality images over varied domains showcasing class-consistency and strong visual diversity.
arXiv Detail & Related papers (2023-11-27T09:20:12Z) - Towards Better Multi-modal Keyphrase Generation via Visual Entity
Enhancement and Multi-granularity Image Noise Filtering [79.44443231700201]
Multi-modal keyphrase generation aims to produce a set of keyphrases that represent the core points of the input text-image pair.
The input text and image are often not perfectly matched, and thus the image may introduce noise into the model.
We propose a novel multi-modal keyphrase generation model, which not only enriches the model input with external knowledge, but also effectively filters image noise.
arXiv Detail & Related papers (2023-09-09T09:41:36Z) - Improving Human-Object Interaction Detection via Virtual Image Learning [68.56682347374422]
Human-Object Interaction (HOI) detection aims to understand the interactions between humans and objects.
In this paper, we propose to alleviate the impact of such an unbalanced distribution via Virtual Image Leaning (VIL)
A novel label-to-image approach, Multiple Steps Image Creation (MUSIC), is proposed to create a high-quality dataset that has a consistent distribution with real images.
arXiv Detail & Related papers (2023-08-04T10:28:48Z) - Rethinking Benchmarks for Cross-modal Image-text Retrieval [44.31783230767321]
Cross-modal semantic understanding and matching is a major challenge in image-text retrieval.
In this paper, we review the two common benchmarks and observe that they are insufficient to assess the true capability of models on fine-grained cross-modal semantic matching.
We propose a novel semi-automatic renovation approach to refine coarse-grained sentences into finer-grained ones with little human effort.
The results show that even the state-of-the-art models have much room for improvement in fine-grained semantic understanding.
arXiv Detail & Related papers (2023-04-21T09:07:57Z) - Group-Wise Semantic Mining for Weakly Supervised Semantic Segmentation [49.90178055521207]
This work addresses weakly supervised semantic segmentation (WSSS), with the goal of bridging the gap between image-level annotations and pixel-level segmentation.
We formulate WSSS as a novel group-wise learning task that explicitly models semantic dependencies in a group of images to estimate more reliable pseudo ground-truths.
In particular, we devise a graph neural network (GNN) for group-wise semantic mining, wherein input images are represented as graph nodes.
arXiv Detail & Related papers (2020-12-09T12:40:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.