Object-IR: Leveraging Object Consistency and Mesh Deformation for Self-Supervised Image Retargeting
- URL: http://arxiv.org/abs/2510.27236v1
- Date: Fri, 31 Oct 2025 06:57:10 GMT
- Title: Object-IR: Leveraging Object Consistency and Mesh Deformation for Self-Supervised Image Retargeting
- Authors: Tianli Liao, Ran Wang, Siqing Zhang, Lei Li, Guangen Liu, Chenyang Zhao, Heling Cao, Peng Li,
- Abstract summary: This paper presents Object-IR, a self-supervised architecture that reformulates image as a learning-based mesh warping optimization problem.<n>We mitigate a uniform rigid mesh at a target aspect ratio and use a convolutional neural network to predict the motion of each mesh grid and obtain the deformed mesh.<n>The framework efficiently processes arbitrary input resolutions while maintaining real-time performance on consumer-grade GPUs.
- Score: 18.51504816209345
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Eliminating geometric distortion in semantically important regions remains an intractable challenge in image retargeting. This paper presents Object-IR, a self-supervised architecture that reformulates image retargeting as a learning-based mesh warping optimization problem, where the mesh deformation is guided by object appearance consistency and geometric-preserving constraints. Given an input image and a target aspect ratio, we initialize a uniform rigid mesh at the output resolution and use a convolutional neural network to predict the motion of each mesh grid and obtain the deformed mesh. The retargeted result is generated by warping the input image according to the rigid mesh in the input image and the deformed mesh in the output resolution. To mitigate geometric distortion, we design a comprehensive objective function incorporating a) object-consistent loss to ensure that the important semantic objects retain their appearance, b) geometric-preserving loss to constrain simple scale transform of the important meshes, and c) boundary loss to enforce a clean rectangular output. Notably, our self-supervised paradigm eliminates the need for manually annotated retargeting datasets by deriving supervision directly from the input's geometric and semantic properties. Extensive evaluations on the RetargetMe benchmark demonstrate that our Object-IR achieves state-of-the-art performance, outperforming existing methods in quantitative metrics and subjective visual quality assessments. The framework efficiently processes arbitrary input resolutions (average inference time: 0.009s for 1024x683 resolution) while maintaining real-time performance on consumer-grade GPUs. The source code will soon be available at https://github.com/tlliao/Object-IR.
Related papers
- Enhancing Rotated Object Detection via Anisotropic Gaussian Bounding Box and Bhattacharyya Distance [0.9786690381850356]
This paper introduces an improved loss function aimed at enhancing detection accuracy and robustness.<n>We advocate for the use of an anisotropic Gaussian representation to address the issues associated with isotropic variance in square-like objects.<n>Our proposed method addresses these challenges by incorporating a rotation-invariant loss function that effectively captures the geometric properties of rotated objects.
arXiv Detail & Related papers (2025-10-18T10:42:30Z) - Object-Centric 2D Gaussian Splatting: Background Removal and Occlusion-Aware Pruning for Compact Object Models [14.555667193538879]
We propose a novel approach that leverages object masks to enable targeted reconstruction, resulting in object-centric models.<n>Our method reconstructs compact object models, yielding object-centric Gaussian and mesh representations that are up to 96% smaller and up to 71% faster to train compared to the baseline.<n>These representations are immediately usable for downstream applications such as appearance editing and physics simulation without additional processing.
arXiv Detail & Related papers (2025-01-14T14:56:31Z) - Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components.
CNNs are used to augment the local texture information of coarse priors.
DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z) - Deformation-Invariant Neural Network and Its Applications in Distorted
Image Restoration and Analysis [8.009077765403287]
Images degraded by geometric distortions pose a significant challenge to imaging and computer vision tasks such as object recognition.
Deep learning-based imaging models usually fail to give accurate performance for geometrically distorted images.
We propose the deformation-invariant neural network (DINN), a framework to address the problem of imaging tasks for geometrically distorted images.
arXiv Detail & Related papers (2023-10-04T08:01:36Z) - Near-filed SAR Image Restoration with Deep Learning Inverse Technique: A
Preliminary Study [5.489791364472879]
Near-field synthetic aperture radar (SAR) provides a high-resolution image of a target's scattering distribution-hot spots.
Meanwhile, imaging result suffers inevitable degradation from sidelobes, clutters, and noises.
To restore the image, current methods make simplified assumptions; for example, the point spread function (PSF) is spatially consistent, the target consists of sparse point scatters, etc.
We reformulate the degradation model into a spatially variable complex-convolution model, where the near-field SAR's system response is considered.
A model-based deep learning network is designed to restore the
arXiv Detail & Related papers (2022-11-28T01:28:33Z) - Object Detection in Aerial Images with Uncertainty-Aware Graph Network [61.02591506040606]
We propose a novel uncertainty-aware object detection framework with a structured-graph, where nodes and edges are denoted by objects.
We refer to our model as Uncertainty-Aware Graph network for object DETection (UAGDet)
arXiv Detail & Related papers (2022-08-23T07:29:03Z) - Exploring Resolution and Degradation Clues as Self-supervised Signal for
Low Quality Object Detection [77.3530907443279]
We propose a novel self-supervised framework to detect objects in degraded low resolution images.
Our methods has achieved superior performance compared with existing methods when facing variant degradation situations.
arXiv Detail & Related papers (2022-08-05T09:36:13Z) - Self-Supervised Video Object Segmentation via Cutout Prediction and
Tagging [117.73967303377381]
We propose a novel self-supervised Video Object (VOS) approach that strives to achieve better object-background discriminability.
Our approach is based on a discriminative learning loss formulation that takes into account both object and background information.
Our proposed approach, CT-VOS, achieves state-of-the-art results on two challenging benchmarks: DAVIS-2017 and Youtube-VOS.
arXiv Detail & Related papers (2022-04-22T17:53:27Z) - Progressive Self-Guided Loss for Salient Object Detection [102.35488902433896]
We present a progressive self-guided loss function to facilitate deep learning-based salient object detection in images.
Our framework takes advantage of adaptively aggregated multi-scale features to locate and detect salient objects effectively.
arXiv Detail & Related papers (2021-01-07T07:33:38Z) - Category Level Object Pose Estimation via Neural Analysis-by-Synthesis [64.14028598360741]
In this paper we combine a gradient-based fitting procedure with a parametric neural image synthesis module.
The image synthesis network is designed to efficiently span the pose configuration space.
We experimentally show that the method can recover orientation of objects with high accuracy from 2D images alone.
arXiv Detail & Related papers (2020-08-18T20:30:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.