Towards Fine-grained Image Classification with Generative Adversarial
Networks and Facial Landmark Detection
- URL: http://arxiv.org/abs/2109.00891v1
- Date: Sat, 28 Aug 2021 06:32:42 GMT
- Title: Towards Fine-grained Image Classification with Generative Adversarial
Networks and Facial Landmark Detection
- Authors: Mahdi Darvish, Mahsa Pouramini, Hamid Bahador
- Abstract summary: We use GAN-based data augmentation to generate extra dataset instances.
We validated our work by evaluating the accuracy of fine-grained image classification on the recent Vision Transformer (ViT) Model.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fine-grained classification remains a challenging task because distinguishing
categories needs learning complex and local differences. Diversity in the pose,
scale, and position of objects in an image makes the problem even more
difficult. Although the recent Vision Transformer models achieve high
performance, they need an extensive volume of input data. To encounter this
problem, we made the best use of GAN-based data augmentation to generate extra
dataset instances. Oxford-IIIT Pets was our dataset of choice for this
experiment. It consists of 37 breeds of cats and dogs with variations in scale,
poses, and lighting, which intensifies the difficulty of the classification
task. Furthermore, we enhanced the performance of the recent Generative
Adversarial Network (GAN), StyleGAN2-ADA model to generate more realistic
images while preventing overfitting to the training set. We did this by
training a customized version of MobileNetV2 to predict animal facial
landmarks; then, we cropped images accordingly. Lastly, we combined the
synthetic images with the original dataset and compared our proposed method
with standard GANs augmentation and no augmentation with different subsets of
training data. We validated our work by evaluating the accuracy of fine-grained
image classification on the recent Vision Transformer (ViT) Model.
Related papers
- Advancing Fine-Grained Classification by Structure and Subject Preserving Augmentation [8.777277201807351]
We present SaSPA: Structure and Subject Preserving Augmentation.
Our method does not use real images as guidance, thereby increasing generation flexibility and promoting greater diversity.
We conduct extensive experiments and benchmark SaSPA against both traditional and recent generative data augmentation methods.
arXiv Detail & Related papers (2024-06-20T17:58:30Z) - Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model [80.61157097223058]
A prevalent strategy to bolster image classification performance is through augmenting the training set with synthetic images generated by T2I models.
In this study, we scrutinize the shortcomings of both current generative and conventional data augmentation techniques.
We introduce an innovative inter-class data augmentation method known as Diff-Mix, which enriches the dataset by performing image translations between classes.
arXiv Detail & Related papers (2024-03-28T17:23:45Z) - Cross-domain and Cross-dimension Learning for Image-to-Graph
Transformers [50.576354045312115]
Direct image-to-graph transformation is a challenging task that solves object detection and relationship prediction in a single model.
We introduce a set of methods enabling cross-domain and cross-dimension transfer learning for image-to-graph transformers.
We demonstrate our method's utility in cross-domain and cross-dimension experiments, where we pretrain our models on 2D satellite images before applying them to vastly different target domains in 2D and 3D.
arXiv Detail & Related papers (2024-03-11T10:48:56Z) - Additional Look into GAN-based Augmentation for Deep Learning COVID-19
Image Classification [57.1795052451257]
We study the dependence of the GAN-based augmentation performance on dataset size with a focus on small samples.
We train StyleGAN2-ADA with both sets and then, after validating the quality of generated images, we use trained GANs as one of the augmentations approaches in multi-class classification problems.
The GAN-based augmentation approach is found to be comparable with classical augmentation in the case of medium and large datasets but underperforms in the case of smaller datasets.
arXiv Detail & Related papers (2024-01-26T08:28:13Z) - Performance of GAN-based augmentation for deep learning COVID-19 image
classification [57.1795052451257]
The biggest challenge in the application of deep learning to the medical domain is the availability of training data.
Data augmentation is a typical methodology used in machine learning when confronted with a limited data set.
In this work, a StyleGAN2-ADA model of Generative Adversarial Networks is trained on the limited COVID-19 chest X-ray image set.
arXiv Detail & Related papers (2023-04-18T15:39:58Z) - Effective Data Augmentation With Diffusion Models [65.09758931804478]
We address the lack of diversity in data augmentation with image-to-image transformations parameterized by pre-trained text-to-image diffusion models.
Our method edits images to change their semantics using an off-the-shelf diffusion model, and generalizes to novel visual concepts from a few labelled examples.
We evaluate our approach on few-shot image classification tasks, and on a real-world weed recognition task, and observe an improvement in accuracy in tested domains.
arXiv Detail & Related papers (2023-02-07T20:42:28Z) - Stacking Ensemble Learning in Deep Domain Adaptation for Ophthalmic
Image Classification [61.656149405657246]
Domain adaptation is effective in image classification tasks where obtaining sufficient label data is challenging.
We propose a novel method, named SELDA, for stacking ensemble learning via extending three domain adaptation methods.
The experimental results using Age-Related Eye Disease Study (AREDS) benchmark ophthalmic dataset demonstrate the effectiveness of the proposed model.
arXiv Detail & Related papers (2022-09-27T14:19:00Z) - Free Lunch for Co-Saliency Detection: Context Adjustment [14.688461235328306]
We propose a "cost-free" group-cut-paste (GCP) procedure to leverage images from off-the-shelf saliency detection datasets and synthesize new samples.
We collect a novel dataset called Context Adjustment Training. The two variants of our dataset, i.e., CAT and CAT+, consist of 16,750 and 33,500 images, respectively.
arXiv Detail & Related papers (2021-08-04T14:51:37Z) - Exploring Vision Transformers for Fine-grained Classification [0.0]
We propose a multi-stage ViT framework for fine-grained image classification tasks, which localizes the informative image regions without requiring architectural changes.
We demonstrate the value of our approach by experimenting with four popular fine-grained benchmarks: CUB-200-2011, Stanford Cars, Stanford Dogs, and FGVC7 Plant Pathology.
arXiv Detail & Related papers (2021-06-19T23:57:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.