Carousel: A High-Resolution Dataset for Multi-Target Automatic Image Cropping
- URL: http://arxiv.org/abs/2511.04680v1
- Date: Thu, 06 Nov 2025 18:59:52 GMT
- Title: Carousel: A High-Resolution Dataset for Multi-Target Automatic Image Cropping
- Authors: Rafe Loya, Andrew Hamara, Benjamin Estell, Benjamin Kilpatrick, Andrew C. Freeman,
- Abstract summary: We introduce a dataset of 277 relevant images and human labels.<n>We evaluate the efficacy of several single-crop models with an image partitioning algorithm as a pre-processing step.
- Score: 0.16311150636417257
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic image cropping is a method for maximizing the human-perceived quality of cropped regions in photographs. Although several works have proposed techniques for producing singular crops, little work has addressed the problem of producing multiple, distinct crops with aesthetic appeal. In this paper, we motivate the problem with a discussion on modern social media applications, introduce a dataset of 277 relevant images and human labels, and evaluate the efficacy of several single-crop models with an image partitioning algorithm as a pre-processing step. The dataset is available at https://github.com/RafeLoya/carousel.
Related papers
- Efficient Multi-Crop Saliency Partitioning for Automatic Image Cropping [0.6906005491572401]
We extend the Fixed Aspect Ratio Cropping algorithm to efficiently extract multiple non-overlapping crops in linear time.<n>Our approach dynamically adjusts attention thresholds and removes selected crops from consideration without recomputing the entire saliency map.
arXiv Detail & Related papers (2025-06-28T08:32:53Z) - Cropper: Vision-Language Model for Image Cropping through In-Context Learning [54.34593968212081]
We propose an effective approach to leverage large vision-language models (VLMs) for image cropping.<n>First, we propose an efficient prompt retrieval mechanism for image cropping to automate the selection of in-context examples.<n>Second, we introduce an iterative refinement strategy to iteratively enhance the predicted crops.<n>The proposed framework, we refer to as Cropper, is applicable to a wide range of cropping tasks, including free-form cropping, subject-aware cropping, and aspect ratio-aware cropping.
arXiv Detail & Related papers (2024-08-14T20:03:03Z) - Learning Subject-Aware Cropping by Outpainting Professional Photos [69.0772948657867]
We propose a weakly-supervised approach to learn what makes a high-quality subject-aware crop from professional stock images.
Our insight is to combine a library of stock images with a modern, pre-trained text-to-image diffusion model.
We are able to automatically generate a large dataset of cropped-uncropped training pairs to train a cropping model.
arXiv Detail & Related papers (2023-12-19T11:57:54Z) - Cones 2: Customizable Image Synthesis with Multiple Subjects [50.54010141032032]
We study how to efficiently represent a particular subject as well as how to appropriately compose different subjects.
By rectifying the activations in the cross-attention map, the layout appoints and separates the location of different subjects in the image.
arXiv Detail & Related papers (2023-05-30T18:00:06Z) - An Experience-based Direct Generation approach to Automatic Image
Cropping [0.0]
We propose a novel method to crop images directly without explicitly modeling image aesthetics.
Our model is trained on a large dataset of images cropped by experienced editors.
We show that our strategy is competitive with or performs better than existing methods in two related tasks.
arXiv Detail & Related papers (2022-12-30T06:25:27Z) - CropMix: Sampling a Rich Input Distribution via Multi-Scale Cropping [97.05377757299672]
We present a simple method, CropMix, for producing a rich input distribution from the original dataset distribution.
CropMix can be seamlessly applied to virtually any training recipe and neural network architecture performing classification tasks.
We show that CropMix is of benefit to both contrastive learning and masked image modeling towards more powerful representations.
arXiv Detail & Related papers (2022-05-31T16:57:28Z) - Deep Image Compositing [93.75358242750752]
We propose a new method which can automatically generate high-quality image composites without any user input.
Inspired by Laplacian pyramid blending, a dense-connected multi-stream fusion network is proposed to effectively fuse the information from the foreground and background images.
Experiments show that the proposed method can automatically generate high-quality composites and outperforms existing methods both qualitatively and quantitatively.
arXiv Detail & Related papers (2020-11-04T06:12:24Z) - Bridging Composite and Real: Towards End-to-end Deep Image Matting [88.79857806542006]
We study the roles of semantics and details for image matting.
We propose a novel Glance and Focus Matting network (GFM), which employs a shared encoder and two separate decoders.
Comprehensive empirical studies have demonstrated that GFM outperforms state-of-the-art methods.
arXiv Detail & Related papers (2020-10-30T10:57:13Z) - Multi-Image Summarization: Textual Summary from a Set of Cohesive Images [17.688344968462275]
This paper proposes the new task of multi-image summarization.
It aims to generate a concise and descriptive textual summary given a coherent set of input images.
A dense average image feature aggregation network allows the model to focus on a coherent subset of attributes.
arXiv Detail & Related papers (2020-06-15T18:45:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.