Spatial-Semantic Collaborative Cropping for User Generated Content
- URL: http://arxiv.org/abs/2401.08086v1
- Date: Tue, 16 Jan 2024 03:25:12 GMT
- Title: Spatial-Semantic Collaborative Cropping for User Generated Content
- Authors: Yukun Su, Yiwen Cao, Jingliang Deng, Fengyun Rao, Qingyao Wu
- Abstract summary: A large amount of User Generated Content (UGC) is uploaded to the Internet daily and displayed to people world-wide.
Previous methods merely consider the aesthetics of the cropped images while ignoring the content integrity, which is crucial for cropping.
We propose a Spatial-Semantic Collaborative cropping network (S2CNet) for arbitrary user generated content accompanied by a new cropping benchmark.
- Score: 32.490403964193014
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A large amount of User Generated Content (UGC) is uploaded to the Internet
daily and displayed to people world-widely through the client side (e.g.,
mobile and PC). This requires the cropping algorithms to produce the aesthetic
thumbnail within a specific aspect ratio on different devices. However,
existing image cropping works mainly focus on landmark or landscape images,
which fail to model the relations among the multi-objects with the complex
background in UGC. Besides, previous methods merely consider the aesthetics of
the cropped images while ignoring the content integrity, which is crucial for
UGC cropping. In this paper, we propose a Spatial-Semantic Collaborative
cropping network (S2CNet) for arbitrary user generated content accompanied by a
new cropping benchmark. Specifically, we first mine the visual genes of the
potential objects. Then, the suggested adaptive attention graph recasts this
task as a procedure of information association over visual nodes. The
underlying spatial and semantic relations are ultimately centralized to the
crop candidate through differentiable message passing, which helps our network
efficiently to preserve both the aesthetics and the content integrity.
Extensive experiments on the proposed UGCrop5K and other public datasets
demonstrate the superiority of our approach over state-of-the-art counterparts.
Our project is available at https://github.com/suyukun666/S2CNet.
Related papers
- Image as Set of Points [60.30495338399321]
Context clusters (CoCs) view an image as a set of unorganized points and extract features via simplified clustering algorithm.
Our CoCs are convolution- and attention-free, and only rely on clustering algorithm for spatial interaction.
arXiv Detail & Related papers (2023-03-02T18:56:39Z) - An Experience-based Direct Generation approach to Automatic Image
Cropping [0.0]
We propose a novel method to crop images directly without explicitly modeling image aesthetics.
Our model is trained on a large dataset of images cropped by experienced editors.
We show that our strategy is competitive with or performs better than existing methods in two related tasks.
arXiv Detail & Related papers (2022-12-30T06:25:27Z) - High-Quality Entity Segmentation [110.55724145851725]
CropFormer is designed to tackle the intractability of instance-level segmentation on high-resolution images.
It improves mask prediction by fusing high-res image crops that provide more fine-grained image details and the full image.
With CropFormer, we achieve a significant AP gain of $1.9$ on the challenging entity segmentation task.
arXiv Detail & Related papers (2022-11-10T18:58:22Z) - Semantic-Aware Generation for Self-Supervised Visual Representation
Learning [116.5814634936371]
We advocate for Semantic-aware Generation (SaGe) to facilitate richer semantics rather than details to be preserved in the generated image.
SaGe complements the target network with view-specific features and thus alleviates the semantic degradation brought by intensive data augmentations.
We execute SaGe on ImageNet-1K and evaluate the pre-trained models on five downstream tasks including nearest neighbor test, linear classification, and fine-scaled image recognition.
arXiv Detail & Related papers (2021-11-25T16:46:13Z) - Semantic Segmentation with Generative Models: Semi-Supervised Learning
and Strong Out-of-Domain Generalization [112.68171734288237]
We propose a novel framework for discriminative pixel-level tasks using a generative model of both images and labels.
We learn a generative adversarial network that captures the joint image-label distribution and is trained efficiently using a large set of unlabeled images.
We demonstrate strong in-domain performance compared to several baselines, and are the first to showcase extreme out-of-domain generalization.
arXiv Detail & Related papers (2021-04-12T21:41:25Z) - Spatial Content Alignment For Pose Transfer [13.018067816407923]
We propose a novel framework to enhance the content consistency of garment textures and the details of human characteristics.
We first alleviate the spatial misalignment by transferring the edge content to the target pose in advance.
Secondly, we introduce a new Content-Style DeBlk which can progressively synthesize photo-realistic person images.
arXiv Detail & Related papers (2021-03-31T06:10:29Z) - Multi-Modal Retrieval using Graph Neural Networks [1.8911962184174562]
We learn a joint vision and concept embedding in the same high-dimensional space.
We model the visual and concept relationships as a graph structure.
We also introduce a novel inference time control, based on selective neighborhood connectivity.
arXiv Detail & Related papers (2020-10-04T19:34:20Z) - Gradient-Induced Co-Saliency Detection [81.54194063218216]
Co-saliency detection (Co-SOD) aims to segment the common salient foreground in a group of relevant images.
In this paper, inspired by human behavior, we propose a gradient-induced co-saliency detection method.
arXiv Detail & Related papers (2020-04-28T08:40:55Z) - Object-Centric Image Generation from Layouts [93.10217725729468]
We develop a layout-to-image-generation method to generate complex scenes with multiple objects.
Our method learns representations of the spatial relationships between objects in the scene, which lead to our model's improved layout-fidelity.
We introduce SceneFID, an object-centric adaptation of the popular Fr'echet Inception Distance metric, that is better suited for multi-object images.
arXiv Detail & Related papers (2020-03-16T21:40:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.