An Experience-based Direct Generation approach to Automatic Image
Cropping
- URL: http://arxiv.org/abs/2212.14561v1
- Date: Fri, 30 Dec 2022 06:25:27 GMT
- Title: An Experience-based Direct Generation approach to Automatic Image
Cropping
- Authors: Casper Christensen and Aneesh Vartakavi
- Abstract summary: We propose a novel method to crop images directly without explicitly modeling image aesthetics.
Our model is trained on a large dataset of images cropped by experienced editors.
We show that our strategy is competitive with or performs better than existing methods in two related tasks.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automatic Image Cropping is a challenging task with many practical downstream
applications. The task is often divided into sub-problems - generating cropping
candidates, finding the visually important regions, and determining aesthetics
to select the most appealing candidate. Prior approaches model one or more of
these sub-problems separately, and often combine them sequentially. We propose
a novel convolutional neural network (CNN) based method to crop images
directly, without explicitly modeling image aesthetics, evaluating multiple
crop candidates, or detecting visually salient regions. Our model is trained on
a large dataset of images cropped by experienced editors and can simultaneously
predict bounding boxes for multiple fixed aspect ratios. We consider the aspect
ratio of the cropped image to be a critical factor that influences aesthetics.
Prior approaches for automatic image cropping, did not enforce the aspect ratio
of the outputs, likely due to a lack of datasets for this task. We, therefore,
benchmark our method on public datasets for two related tasks - first,
aesthetic image cropping without regard to aspect ratio, and second, thumbnail
generation that requires fixed aspect ratio outputs, but where aesthetics are
not crucial. We show that our strategy is competitive with or performs better
than existing methods in both these tasks. Furthermore, our one-stage model is
easier to train and significantly faster than existing two-stage or end-to-end
methods for inference. We present a qualitative evaluation study, and find that
our model is able to generalize to diverse images from unseen datasets and
often retains compositional properties of the original images after cropping.
Our results demonstrate that explicitly modeling image aesthetics or visual
attention regions is not necessarily required to build a competitive image
cropping algorithm.
Related papers
- Cropper: Vision-Language Model for Image Cropping through In-Context Learning [57.694845787252916]
The goal of image cropping is to identify visually appealing crops within an image.
Recent breakthroughs in large vision-language models (VLMs) have enabled visual in-context learning without explicit training.
We propose an effective approach to leverage VLMs for better image cropping.
arXiv Detail & Related papers (2024-08-14T20:03:03Z) - Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model [80.61157097223058]
A prevalent strategy to bolster image classification performance is through augmenting the training set with synthetic images generated by T2I models.
In this study, we scrutinize the shortcomings of both current generative and conventional data augmentation techniques.
We introduce an innovative inter-class data augmentation method known as Diff-Mix, which enriches the dataset by performing image translations between classes.
arXiv Detail & Related papers (2024-03-28T17:23:45Z) - Spatial-Semantic Collaborative Cropping for User Generated Content [32.490403964193014]
A large amount of User Generated Content (UGC) is uploaded to the Internet daily and displayed to people world-wide.
Previous methods merely consider the aesthetics of the cropped images while ignoring the content integrity, which is crucial for cropping.
We propose a Spatial-Semantic Collaborative cropping network (S2CNet) for arbitrary user generated content accompanied by a new cropping benchmark.
arXiv Detail & Related papers (2024-01-16T03:25:12Z) - Learning Subject-Aware Cropping by Outpainting Professional Photos [69.0772948657867]
We propose a weakly-supervised approach to learn what makes a high-quality subject-aware crop from professional stock images.
Our insight is to combine a library of stock images with a modern, pre-trained text-to-image diffusion model.
We are able to automatically generate a large dataset of cropped-uncropped training pairs to train a cropping model.
arXiv Detail & Related papers (2023-12-19T11:57:54Z) - Correlational Image Modeling for Self-Supervised Visual Pre-Training [81.82907503764775]
Correlational Image Modeling is a novel and surprisingly effective approach to self-supervised visual pre-training.
Three key designs enable correlational image modeling as a nontrivial and meaningful self-supervisory task.
arXiv Detail & Related papers (2023-03-22T15:48:23Z) - ClipCrop: Conditioned Cropping Driven by Vision-Language Model [90.95403416150724]
We take advantage of vision-language models as a foundation for creating robust and user-intentional cropping algorithms.
We develop a method to perform cropping with a text or image query that reflects the user's intention as guidance.
Our pipeline design allows the model to learn text-conditioned aesthetic cropping with a small dataset.
arXiv Detail & Related papers (2022-11-21T14:27:07Z) - Estimating Appearance Models for Image Segmentation via Tensor
Factorization [0.0]
We propose a new approach to directly estimate appearance models from the image without prior information on the underlying segmentation.
Our method uses local high order color statistics from the image as an input to tensor factorization-based estimator for latent variable models.
This approach is able to estimate models in multiregion images and automatically output the regions proportions without prior user interaction.
arXiv Detail & Related papers (2022-08-16T17:21:00Z) - Image Aesthetics Assessment Using Graph Attention Network [17.277954886018353]
We present a two-stage framework based on graph neural networks for image aesthetics assessment.
First, we propose a feature-graph representation in which the input image is modelled as a graph, maintaining its original aspect ratio and resolution.
Second, we propose a graph neural network architecture that takes this feature-graph and captures the semantic relationship between the different regions of the input image using visual attention.
arXiv Detail & Related papers (2022-06-26T12:52:46Z) - Dependent Multi-Task Learning with Causal Intervention for Image
Captioning [10.6405791176668]
In this paper, we propose a dependent multi-task learning framework with the causal intervention (DMTCI)
Firstly, we involve an intermediate task, bag-of-categories generation, before the final task, image captioning.
Secondly, we apply Pearl's do-calculus on the model, cutting off the link between the visual features and possible confounders.
Finally, we use a multi-agent reinforcement learning strategy to enable end-to-end training and reduce the inter-task error accumulations.
arXiv Detail & Related papers (2021-05-18T14:57:33Z) - Bridging Composite and Real: Towards End-to-end Deep Image Matting [88.79857806542006]
We study the roles of semantics and details for image matting.
We propose a novel Glance and Focus Matting network (GFM), which employs a shared encoder and two separate decoders.
Comprehensive empirical studies have demonstrated that GFM outperforms state-of-the-art methods.
arXiv Detail & Related papers (2020-10-30T10:57:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.