Learning Subject-Aware Cropping by Outpainting Professional Photos
- URL: http://arxiv.org/abs/2312.12080v2
- Date: Thu, 4 Apr 2024 13:36:21 GMT
- Title: Learning Subject-Aware Cropping by Outpainting Professional Photos
- Authors: James Hong, Lu Yuan, Michaƫl Gharbi, Matthew Fisher, Kayvon Fatahalian,
- Abstract summary: We propose a weakly-supervised approach to learn what makes a high-quality subject-aware crop from professional stock images.
Our insight is to combine a library of stock images with a modern, pre-trained text-to-image diffusion model.
We are able to automatically generate a large dataset of cropped-uncropped training pairs to train a cropping model.
- Score: 69.0772948657867
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: How to frame (or crop) a photo often depends on the image subject and its context; e.g., a human portrait. Recent works have defined the subject-aware image cropping task as a nuanced and practical version of image cropping. We propose a weakly-supervised approach (GenCrop) to learn what makes a high-quality, subject-aware crop from professional stock images. Unlike supervised prior work, GenCrop requires no new manual annotations beyond the existing stock image collection. The key challenge in learning from this data, however, is that the images are already cropped and we do not know what regions were removed. Our insight is to combine a library of stock images with a modern, pre-trained text-to-image diffusion model. The stock image collection provides diversity and its images serve as pseudo-labels for a good crop, while the text-image diffusion model is used to out-paint (i.e., outward inpainting) realistic uncropped images. Using this procedure, we are able to automatically generate a large dataset of cropped-uncropped training pairs to train a cropping model. Despite being weakly-supervised, GenCrop is competitive with state-of-the-art supervised methods and significantly better than comparable weakly-supervised baselines on quantitative and qualitative evaluation metrics.
Related papers
- Cropper: Vision-Language Model for Image Cropping through In-Context Learning [57.694845787252916]
The goal of image cropping is to identify visually appealing crops within an image.
Recent breakthroughs in large vision-language models (VLMs) have enabled visual in-context learning without explicit training.
We propose an effective approach to leverage VLMs for better image cropping.
arXiv Detail & Related papers (2024-08-14T20:03:03Z) - MagiCapture: High-Resolution Multi-Concept Portrait Customization [34.131515004434846]
MagiCapture is a personalization method for integrating subject and style concepts to generate high-resolution portrait images.
We present a novel Attention Refocusing loss coupled with auxiliary priors, both of which facilitate robust learning within this weakly supervised learning setting.
Our pipeline also includes additional post-processing steps to ensure the creation of highly realistic outputs.
arXiv Detail & Related papers (2023-09-13T11:37:04Z) - Generating images of rare concepts using pre-trained diffusion models [32.5337654536764]
Text-to-image diffusion models can synthesize high-quality images, but they have various limitations.
We show that their limitation is partly due to the long-tail nature of their training data.
We show that rare concepts can be correctly generated by carefully selecting suitable generation seeds in the noise space.
arXiv Detail & Related papers (2023-04-27T20:55:38Z) - Saliency Guided Contrastive Learning on Scene Images [71.07412958621052]
We leverage the saliency map derived from the model's output during learning to highlight discriminative regions and guide the whole contrastive learning.
Our method significantly improves the performance of self-supervised learning on scene images by +1.1, +4.3, +2.2 Top1 accuracy in ImageNet linear evaluation, Semi-supervised learning with 1% and 10% ImageNet labels, respectively.
arXiv Detail & Related papers (2023-02-22T15:54:07Z) - An Experience-based Direct Generation approach to Automatic Image
Cropping [0.0]
We propose a novel method to crop images directly without explicitly modeling image aesthetics.
Our model is trained on a large dataset of images cropped by experienced editors.
We show that our strategy is competitive with or performs better than existing methods in two related tasks.
arXiv Detail & Related papers (2022-12-30T06:25:27Z) - ClipCrop: Conditioned Cropping Driven by Vision-Language Model [90.95403416150724]
We take advantage of vision-language models as a foundation for creating robust and user-intentional cropping algorithms.
We develop a method to perform cropping with a text or image query that reflects the user's intention as guidance.
Our pipeline design allows the model to learn text-conditioned aesthetic cropping with a small dataset.
arXiv Detail & Related papers (2022-11-21T14:27:07Z) - CtlGAN: Few-shot Artistic Portraits Generation with Contrastive Transfer
Learning [77.27821665339492]
CtlGAN is a new few-shot artistic portraits generation model with a novel contrastive transfer learning strategy.
We adapt a pretrained StyleGAN in the source domain to a target artistic domain with no more than 10 artistic faces.
We propose a new encoder which embeds real faces into Z+ space and proposes a dual-path training strategy to better cope with the adapted decoder.
arXiv Detail & Related papers (2022-03-16T13:28:17Z) - Object-Aware Cropping for Self-Supervised Learning [21.79324121283122]
We show that self-supervised learning based on the usual random cropping performs poorly on such datasets.
We propose replacing one or both of the random crops with crops obtained from an object proposal algorithm.
Using this approach, which we call object-aware cropping, results in significant improvements over scene cropping on classification and object detection benchmarks.
arXiv Detail & Related papers (2021-12-01T07:23:37Z) - Self-Adaptively Learning to Demoire from Focused and Defocused Image
Pairs [97.67638106818613]
Moire artifacts are common in digital photography, resulting from the interference between high-frequency scene content and the color filter array of the camera.
Existing deep learning-based demoireing methods trained on large scale iteration are limited in handling various complex moire patterns.
We propose a self-adaptive learning method for demoireing a high-frequency image, with the help of an additional defocused moire-free blur image.
arXiv Detail & Related papers (2020-11-03T23:09:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.