Photozilla: A Large-Scale Photography Dataset and Visual Embedding for
20 Photography Styles
- URL: http://arxiv.org/abs/2106.11359v1
- Date: Mon, 21 Jun 2021 18:45:06 GMT
- Title: Photozilla: A Large-Scale Photography Dataset and Visual Embedding for
20 Photography Styles
- Authors: Trisha Singhal, Junhua Liu, Lucienne T. M. Blessing, Kwan Hui Lim
- Abstract summary: We introduce a large-scale dataset termed 'Photozilla' that includes over 990k images belonging to 10 different photographic styles.
The dataset is then used to train 3 classification models to automatically classify the images into the relevant style.
We report an accuracy of over 68% for identifying 10 other distinct types of photography styles.
- Score: 0.6308539010172307
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The advent of social media platforms has been a catalyst for the development
of digital photography that engendered a boom in vision applications. With this
motivation, we introduce a large-scale dataset termed 'Photozilla', which
includes over 990k images belonging to 10 different photographic styles. The
dataset is then used to train 3 classification models to automatically classify
the images into the relevant style which resulted in an accuracy of ~96%. With
the rapid evolution of digital photography, we have seen new types of
photography styles emerging at an exponential rate. On that account, we present
a novel Siamese-based network that uses the trained classification models as
the base architecture to adapt and classify unseen styles with only 25 training
samples. We report an accuracy of over 68% for identifying 10 other distinct
types of photography styles. This dataset can be found at
https://trisha025.github.io/Photozilla/
Related papers
- ProCrop: Learning Aesthetic Image Cropping from Professional Compositions [57.949730056500634]
ProCrop is a retrieval-based method that leverages professional photography to guide cropping decisions.<n>We present a large-scale dataset of 242K weakly-annotated images, generated by out-painting professional images.<n>This composition-aware dataset generation offers diverse high-quality crop proposals guided by aesthetic principles.
arXiv Detail & Related papers (2025-05-28T15:38:44Z) - Reinforcing Pre-trained Models Using Counterfactual Images [54.26310919385808]
This paper proposes a novel framework to reinforce classification models using language-guided generated counterfactual images.
We identify model weaknesses by testing the model using the counterfactual image dataset.
We employ the counterfactual images as an augmented dataset to fine-tune and reinforce the classification model.
arXiv Detail & Related papers (2024-06-19T08:07:14Z) - Measuring Style Similarity in Diffusion Models [118.22433042873136]
We present a framework for understanding and extracting style descriptors from images.
Our framework comprises a new dataset curated using the insight that style is a subjective property of an image.
We also propose a method to extract style attribute descriptors that can be used to style of a generated image to the images used in the training dataset of a text-to-image model.
arXiv Detail & Related papers (2024-04-01T17:58:30Z) - Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model [80.61157097223058]
A prevalent strategy to bolster image classification performance is through augmenting the training set with synthetic images generated by T2I models.
In this study, we scrutinize the shortcomings of both current generative and conventional data augmentation techniques.
We introduce an innovative inter-class data augmentation method known as Diff-Mix, which enriches the dataset by performing image translations between classes.
arXiv Detail & Related papers (2024-03-28T17:23:45Z) - Multimodal Foundation Models for Zero-shot Animal Species Recognition in
Camera Trap Images [57.96659470133514]
Motion-activated camera traps constitute an efficient tool for tracking and monitoring wildlife populations across the globe.
Supervised learning techniques have been successfully deployed to analyze such imagery, however training such techniques requires annotations from experts.
Reducing the reliance on costly labelled data has immense potential in developing large-scale wildlife tracking solutions with markedly less human labor.
arXiv Detail & Related papers (2023-11-02T08:32:00Z) - New Benchmarks for Asian Facial Recognition Tasks: Face Classification
with Large Foundation Models [3.437372707846067]
This paper introduces a new Large-Scale Korean Influencer dataset named KoIn.
Most of the images in our proposed dataset have been collected from social network services (SNS) such as Instagram.
Our dataset, KoIn, contains over 100,000 K-influencer photos from over 100 Korean celebrity classes.
arXiv Detail & Related papers (2023-10-15T06:51:03Z) - Effective Data Augmentation With Diffusion Models [65.09758931804478]
We address the lack of diversity in data augmentation with image-to-image transformations parameterized by pre-trained text-to-image diffusion models.
Our method edits images to change their semantics using an off-the-shelf diffusion model, and generalizes to novel visual concepts from a few labelled examples.
We evaluate our approach on few-shot image classification tasks, and on a real-world weed recognition task, and observe an improvement in accuracy in tested domains.
arXiv Detail & Related papers (2023-02-07T20:42:28Z) - Vision Models Are More Robust And Fair When Pretrained On Uncurated
Images Without Supervision [38.22842778742829]
Discriminative self-supervised learning allows training models on any random group of internet images.
We train models on billions of random images without any data pre-processing or prior assumptions about what we want the model to learn.
We extensively study and validate our model performance on over 50 benchmarks including fairness, to distribution shift, geographical diversity, fine grained recognition, image copy detection and many image classification datasets.
arXiv Detail & Related papers (2022-02-16T22:26:47Z) - Florida Wildlife Camera Trap Dataset [48.99466876948454]
We introduce a challenging wildlife camera trap classification dataset collected from two different locations in Southwestern Florida.
The dataset consists of 104,495 images featuring visually similar species, varying illumination conditions, skewed class distribution, and including samples of endangered species.
arXiv Detail & Related papers (2021-06-23T18:53:15Z) - How many images do I need? Understanding how sample size per class
affects deep learning model performance metrics for balanced designs in
autonomous wildlife monitoring [0.0]
We explore in depth the issues of deep learning model performance for progressively increasing per class (species) sample sizes.
We provide ecologists with an approximation formula to estimate how many images per animal species they need for certain accuracy level a priori.
arXiv Detail & Related papers (2020-10-16T06:28:35Z) - Salienteye: Maximizing Engagement While Maintaining Artistic Style on
Instagram Using Deep Neural Networks [27.469454386934274]
We use transfer learning to adapt Xception, which is a model for object recognition trained on the ImageNet dataset, to the task of engagement prediction.
We also use Gram matrices generated from VGG19, another object recognition model trained on ImageNet, for the task of style similarity measurement.
Our models can be trained on individual Instagram accounts to create personalized engagement prediction and style similarity models.
arXiv Detail & Related papers (2020-06-13T01:58:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.