ShelfRectNet: Single View Shelf Image Rectification with Homography Estimation
- URL: http://arxiv.org/abs/2511.20335v1
- Date: Tue, 25 Nov 2025 14:14:17 GMT
- Title: ShelfRectNet: Single View Shelf Image Rectification with Homography Estimation
- Authors: Onur Berk Tore, Ibrahim Samil Yalciner, Server Calap,
- Abstract summary: We present a deep learning framework that predicts a 4-point parameterized homography matrix to rectify shelf images captured from arbitrary angles.<n>Our method achieves a mean corner error of 1.298 pixels on the test set.<n>To encourage further research in this domain, we will make our dataset, ShelfRectSet, and code publicly available.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Estimating homography from a single image remains a challenging yet practically valuable task, particularly in domains like retail, where only one viewpoint is typically available for shelf monitoring and product alignment. In this paper, we present a deep learning framework that predicts a 4-point parameterized homography matrix to rectify shelf images captured from arbitrary angles. Our model leverages a ConvNeXt-based backbone for enhanced feature representation and adopts normalized coordinate regression for improved stability. To address data scarcity and promote generalization, we introduce a novel augmentation strategy by modeling and sampling synthetic homographies. Our method achieves a mean corner error of 1.298 pixels on the test set. When compared with both classical computer vision and deep learning-based approaches, our method demonstrates competitive performance in both accuracy and inference speed. Together, these results establish our approach as a robust and efficient solution for realworld single-view rectification. To encourage further research in this domain, we will make our dataset, ShelfRectSet, and code publicly available
Related papers
- Reloc-VGGT: Visual Re-localization with Geometry Grounded Transformer [40.778996326009185]
We present the first visual localization framework that performs multi-view spatial integration through an early-fusion mechanism.<n>Our framework is built upon the VGGT backbone, which encodes multi-view 3D geometry.<n>We propose a novel sparse mask attention strategy that reduces computational cost by avoiding the quadratic complexity of global attention.
arXiv Detail & Related papers (2025-12-26T06:12:17Z) - Beyond Frequency: Scoring-Driven Debiasing for Object Detection via Blueprint-Prompted Image Synthesis [97.37770785712475]
We present a generation-based debiasing framework for object detection.<n>Our method significantly narrows the performance gap for underrepresented object groups.
arXiv Detail & Related papers (2025-10-21T02:19:12Z) - Zero-shot Inexact CAD Model Alignment from a Single Image [53.37898107159792]
A practical approach to infer 3D scene structure from a single image is to retrieve a closely matching 3D model from a database and align it with the object in the image.<n>Existing methods rely on supervised training with images and pose annotations, which limits them to a narrow set of object categories.<n>We propose a weakly supervised 9-DoF alignment method for inexact 3D models that requires no pose annotations and generalizes to unseen categories.
arXiv Detail & Related papers (2025-07-04T04:46:59Z) - Boosting Zero-shot Stereo Matching using Large-scale Mixed Images Sources in the Real World [8.56549004133167]
Stereo matching methods rely on dense pixel-wise ground truth labels.<n>The scarcity of labeled data and domain gaps between synthetic and real-world images pose notable challenges.<n>We propose a novel framework, textbfBooSTer, that leverages both vision foundation models and large-scale mixed image sources.
arXiv Detail & Related papers (2025-05-13T14:24:38Z) - Co-op: Correspondence-based Novel Object Pose Estimation [14.598853174946656]
Co-op is a novel method for accurately and robustly estimating the 6DoF pose of objects unseen during training from a single RGB image.<n>Our method requires only the CAD model of the target object and can precisely estimate its pose without any additional fine-tuning.
arXiv Detail & Related papers (2025-03-22T11:24:19Z) - Breaking the Frame: Visual Place Recognition by Overlap Prediction [53.17564423756082]
We propose a novel visual place recognition approach based on overlap prediction, called VOP.<n>VOP proceeds co-visible image sections by obtaining patch-level embeddings using a Vision Transformer backbone.<n>Our approach uses a voting mechanism to assess overlap scores for potential database images.
arXiv Detail & Related papers (2024-06-23T20:00:20Z) - FAR: Flexible, Accurate and Robust 6DoF Relative Camera Pose Estimation [30.710296843150832]
Estimating relative camera poses between images has been a central problem in computer vision.
We show how to combine the best of both methods; our approach yields results that are both precise and robust.
A comprehensive analysis supports our design choices and demonstrates that our method adapts flexibly to various feature extractors and correspondence estimators.
arXiv Detail & Related papers (2024-03-05T18:59:51Z) - Layered Rendering Diffusion Model for Controllable Zero-Shot Image Synthesis [15.76266032768078]
This paper introduces innovative solutions to enhance spatial controllability in diffusion models reliant on text queries.<n>We first introduce vision guidance as a foundational spatial cue within the perturbed distribution.<n>We propose a universal framework, Layered Rendering Diffusion (LRDiff), which constructs an image-rendering process with multiple layers.
arXiv Detail & Related papers (2023-11-30T10:36:19Z) - Patch-Wise Self-Supervised Visual Representation Learning: A Fine-Grained Approach [4.9204263448542465]
This study introduces an innovative, fine-grained dimension by integrating patch-level discrimination into self-supervised visual representation learning.
We employ a distinctive photometric patch-level augmentation, where each patch is individually augmented, independent from other patches within the same view.
We present a simple yet effective patch-matching algorithm to find the corresponding patches across the augmented views.
arXiv Detail & Related papers (2023-10-28T09:35:30Z) - View Consistent Purification for Accurate Cross-View Localization [59.48131378244399]
This paper proposes a fine-grained self-localization method for outdoor robotics.
The proposed method addresses limitations in existing cross-view localization methods.
It is the first sparse visual-only method that enhances perception in dynamic environments.
arXiv Detail & Related papers (2023-08-16T02:51:52Z) - Semantic keypoint-based pose estimation from single RGB frames [64.80395521735463]
We present an approach to estimating the continuous 6-DoF pose of an object from a single RGB image.
The approach combines semantic keypoints predicted by a convolutional network (convnet) with a deformable shape model.
We show that our approach can accurately recover the 6-DoF object pose for both instance- and class-based scenarios.
arXiv Detail & Related papers (2022-04-12T15:03:51Z) - A Model for Multi-View Residual Covariances based on Perspective
Deformation [88.21738020902411]
We derive a model for the covariance of the visual residuals in multi-view SfM, odometry and SLAM setups.
We validate our model with synthetic and real data and integrate it into photometric and feature-based Bundle Adjustment.
arXiv Detail & Related papers (2022-02-01T21:21:56Z) - Perceptual Loss for Robust Unsupervised Homography Estimation [1.2891210250935146]
BiHomE minimizes the distance in the feature space between the warped image from the source viewpoint and the corresponding image from the target viewpoint.
We show that biHomE achieves state-of-the-art performance on synthetic COCO dataset, which is also comparable or better compared to supervised approaches.
arXiv Detail & Related papers (2021-04-20T14:41:54Z) - LM-Reloc: Levenberg-Marquardt Based Direct Visual Relocalization [54.77498358487812]
LM-Reloc is a novel approach for visual relocalization based on direct image alignment.
We propose a loss formulation inspired by the classical Levenberg-Marquardt algorithm to train LM-Net.
arXiv Detail & Related papers (2020-10-13T12:15:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.