Related papers: Repurposing Existing Deep Networks for Caption and Aesthetic-Guided Image Cropping

Repurposing Existing Deep Networks for Caption and Aesthetic-Guided Image Cropping

URL: http://arxiv.org/abs/2201.02280v1
Date: Fri, 7 Jan 2022 00:23:40 GMT
Title: Repurposing Existing Deep Networks for Caption and Aesthetic-Guided Image Cropping
Authors: Nora Horanyi, Kedi Xia, Kwang Moo Yi, Abhishake Kumar Bojja, Ales Leonardis, Hyung Jin Chang
Abstract summary: We propose a novel optimization framework that crops a given image based on user description and aesthetics. Our framework can produce crops that are well-aligned to intended user descriptions and aesthetically pleasing.
Score: 33.46066328197085
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a novel optimization framework that crops a given image based on user description and aesthetics. Unlike existing image cropping methods, where one typically trains a deep network to regress to crop parameters or cropping actions, we propose to directly optimize for the cropping parameters by repurposing pre-trained networks on image captioning and aesthetic tasks, without any fine-tuning, thereby avoiding training a separate network. Specifically, we search for the best crop parameters that minimize a combined loss of the initial objectives of these networks. To make the optimization table, we propose three strategies: (i) multi-scale bilinear sampling, (ii) annealing the scale of the crop region, therefore effectively reducing the parameter space, (iii) aggregation of multiple optimization results. Through various quantitative and qualitative evaluations, we show that our framework can produce crops that are well-aligned to intended user descriptions and aesthetically pleasing.

Related papers

Intrinsic Image Fusion for Multi-View 3D Material Reconstruction [49.43509537480623]
We introduce Intrinsic Image Fusion, a method that reconstructs high-quality physically based materials from multi-view images.<n>Our results outperform state-of-the-art methods in material disentanglement on both synthetic and real scenes.
arXiv Detail & Related papers (2025-12-15T10:05:59Z)
Dynamic Classifier-Free Diffusion Guidance via Online Feedback [53.54876309092376]
"One-size-all" approach fails to adapt to the diverse requirements of different prompts.<n>We introduce a framework for dynamic CFG scheduling.<n>We demonstrate the effectiveness of our approach on both small-scale models and the state-of-the-art Imagen 3.
arXiv Detail & Related papers (2025-09-19T16:27:19Z)
Efficient Multi-Crop Saliency Partitioning for Automatic Image Cropping [0.6906005491572401]
We extend the Fixed Aspect Ratio Cropping algorithm to efficiently extract multiple non-overlapping crops in linear time.<n>Our approach dynamically adjusts attention thresholds and removes selected crops from consideration without recomputing the entire saliency map.
arXiv Detail & Related papers (2025-06-28T08:32:53Z)
Self-Calibrating Gaussian Splatting for Large Field of View Reconstruction [30.529707438964596]
We present a self-calibrating framework that jointly optimize camera parameters, lens distortion and 3D Gaussian representations. Our technique enables high-quality scene reconstruction from Large field-of-view (FOV) imagery taken with wide-angle lenses, allowing the scene to be modeled from a smaller number of images.
arXiv Detail & Related papers (2025-02-13T18:15:10Z)
Preserving Deep Representations In One-Shot Pruning: A Hessian-Free Second-Order Optimization Framework [12.331056472174275]
We present SNOWS, a one-shot post-training pruning framework aimed at reducing the cost of vision network inference without retraining. A key innovation of our framework is the use of Hessian-free optimization to compute exact Newton descent steps without needing to compute or store the full Hessian matrix.
arXiv Detail & Related papers (2024-11-27T14:25:00Z)
Towards Self-Supervised FG-SBIR with Unified Sample Feature Alignment and Multi-Scale Token Recycling [11.129453244307369]
FG-SBIR aims to minimize the distance between sketches and corresponding images in the embedding space. We propose an effective approach to narrow the gap between the two domains. It mainly facilitates unified mutual information sharing both intra- and inter-samples.
arXiv Detail & Related papers (2024-06-17T13:49:12Z)
Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object Structure via HyperNetworks [53.67497327319569]
We introduce a novel neural rendering technique to solve image-to-3D from a single view. Our approach employs the signed distance function as the surface representation and incorporates generalizable priors through geometry-encoding volumes and HyperNetworks. Our experiments show the advantages of our proposed approach with consistent results and rapid generation.
arXiv Detail & Related papers (2023-12-24T08:42:37Z)
ATASI-Net: An Efficient Sparse Reconstruction Network for Tomographic SAR Imaging with Adaptive Threshold [13.379416816598873]
This paper proposes a novel efficient sparse unfolding network based on the analytic learned iterative shrinkage thresholding algorithm (ALISTA) The weight matrix in each layer of ATASI-Net is pre-computed as the solution of an off-line optimization problem. In addition, adaptive threshold is introduced for each azimuth-range pixel, enabling the threshold shrinkage to be not only layer-varied but also element-wise.
arXiv Detail & Related papers (2022-11-30T09:55:45Z)
Differentiable Rendering with Perturbed Optimizers [85.66675707599782]
Reasoning about 3D scenes from their 2D image projections is one of the core problems in computer vision. Our work highlights the link between some well-known differentiable formulations and randomly smoothed renderings. We apply our method to 3D scene reconstruction and demonstrate its advantages on the tasks of 6D pose estimation and 3D mesh reconstruction.
arXiv Detail & Related papers (2021-10-18T08:56:23Z)
NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-view Stereo [97.07453889070574]
We present a new multi-view depth estimation method that utilizes both conventional SfM reconstruction and learning-based priors. We show that our proposed framework significantly outperforms state-of-the-art methods on indoor scenes.
arXiv Detail & Related papers (2021-09-02T17:54:31Z)
Riggable 3D Face Reconstruction via In-Network Optimization [58.016067611038046]
This paper presents a method for riggable 3D face reconstruction from monocular images. It jointly estimates a personalized face rig and per-image parameters including expressions, poses, and illuminations. Experiments demonstrate that our method achieves SOTA reconstruction accuracy, reasonable robustness and generalization ability.
arXiv Detail & Related papers (2021-04-08T03:53:20Z)
Online Exemplar Fine-Tuning for Image-to-Image Translation [32.556050882376965]
Existing techniques to solve exemplar-based image-to-image translation within deep convolutional neural networks (CNNs) generally require a training phase to optimize the network parameters. We propose a novel framework, for the first time, to solve exemplar-based translation through an online optimization given an input image pair. Our framework does not require the off-line training phase, which has been the main challenge of existing methods, but the pre-trained networks to enable optimization in online.
arXiv Detail & Related papers (2020-11-18T15:13:16Z)
Road Segmentation for Remote Sensing Images using Adversarial Spatial Pyramid Networks [28.32775611169636]
We introduce a new model to apply structured domain adaption for synthetic image generation and road segmentation. A novel scale-wise architecture is introduced to learn from the multi-level feature maps and improve the semantics of the features. Our model achieves state-of-the-art 78.86 IOU on the Massachusetts dataset with 14.89M parameters and 86.78B FLOPs, with 4x fewer FLOPs but higher accuracy (+3.47% IOU)
arXiv Detail & Related papers (2020-08-10T11:00:19Z)
Perceptually Optimizing Deep Image Compression [53.705543593594285]
Mean squared error (MSE) and $ell_p$ norms have largely dominated the measurement of loss in neural networks. We propose a different proxy approach to optimize image analysis networks against quantitative perceptual models.
arXiv Detail & Related papers (2020-07-03T14:33:28Z)
Learning Deformable Image Registration from Optimization: Perspective, Modules, Bilevel Training and Beyond [62.730497582218284]
We develop a new deep learning based framework to optimize a diffeomorphic model via multi-scale propagation. We conduct two groups of image registration experiments on 3D volume datasets including image-to-atlas registration on brain MRI data and image-to-image registration on liver CT data.
arXiv Detail & Related papers (2020-04-30T03:23:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.