Related papers: ShelfRectNet: Single View Shelf Image Rectification with Homography Estimation

ShelfRectNet: Single View Shelf Image Rectification with Homography Estimation

URL: http://arxiv.org/abs/2511.20335v1
Date: Tue, 25 Nov 2025 14:14:17 GMT
Title: ShelfRectNet: Single View Shelf Image Rectification with Homography Estimation
Authors: Onur Berk Tore, Ibrahim Samil Yalciner, Server Calap,
Abstract summary: We present a deep learning framework that predicts a 4-point parameterized homography matrix to rectify shelf images captured from arbitrary angles.<n>Our method achieves a mean corner error of 1.298 pixels on the test set.<n>To encourage further research in this domain, we will make our dataset, ShelfRectSet, and code publicly available.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Estimating homography from a single image remains a challenging yet practically valuable task, particularly in domains like retail, where only one viewpoint is typically available for shelf monitoring and product alignment. In this paper, we present a deep learning framework that predicts a 4-point parameterized homography matrix to rectify shelf images captured from arbitrary angles. Our model leverages a ConvNeXt-based backbone for enhanced feature representation and adopts normalized coordinate regression for improved stability. To address data scarcity and promote generalization, we introduce a novel augmentation strategy by modeling and sampling synthetic homographies. Our method achieves a mean corner error of 1.298 pixels on the test set. When compared with both classical computer vision and deep learning-based approaches, our method demonstrates competitive performance in both accuracy and inference speed. Together, these results establish our approach as a robust and efficient solution for realworld single-view rectification. To encourage further research in this domain, we will make our dataset, ShelfRectSet, and code publicly available

Related papers

Reloc-VGGT: Visual Re-localization with Geometry Grounded Transformer [40.778996326009185]
We present the first visual localization framework that performs multi-view spatial integration through an early-fusion mechanism.<n>Our framework is built upon the VGGT backbone, which encodes multi-view 3D geometry.<n>We propose a novel sparse mask attention strategy that reduces computational cost by avoiding the quadratic complexity of global attention.
arXiv Detail & Related papers (2025-12-26T06:12:17Z)
Beyond Frequency: Scoring-Driven Debiasing for Object Detection via Blueprint-Prompted Image Synthesis [97.37770785712475]
We present a generation-based debiasing framework for object detection.<n>Our method significantly narrows the performance gap for underrepresented object groups.
arXiv Detail & Related papers (2025-10-21T02:19:12Z)
Zero-shot Inexact CAD Model Alignment from a Single Image [53.37898107159792]
A practical approach to infer 3D scene structure from a single image is to retrieve a closely matching 3D model from a database and align it with the object in the image.<n>Existing methods rely on supervised training with images and pose annotations, which limits them to a narrow set of object categories.<n>We propose a weakly supervised 9-DoF alignment method for inexact 3D models that requires no pose annotations and generalizes to unseen categories.
arXiv Detail & Related papers (2025-07-04T04:46:59Z)
Boosting Zero-shot Stereo Matching using Large-scale Mixed Images Sources in the Real World [8.56549004133167]
Stereo matching methods rely on dense pixel-wise ground truth labels.<n>The scarcity of labeled data and domain gaps between synthetic and real-world images pose notable challenges.<n>We propose a novel framework, textbfBooSTer, that leverages both vision foundation models and large-scale mixed image sources.
arXiv Detail & Related papers (2025-05-13T14:24:38Z)
Co-op: Correspondence-based Novel Object Pose Estimation [14.598853174946656]
Co-op is a novel method for accurately and robustly estimating the 6DoF pose of objects unseen during training from a single RGB image.<n>Our method requires only the CAD model of the target object and can precisely estimate its pose without any additional fine-tuning.
arXiv Detail & Related papers (2025-03-22T11:24:19Z)
Breaking the Frame: Visual Place Recognition by Overlap Prediction [53.17564423756082]
We propose a novel visual place recognition approach based on overlap prediction, called VOP.<n>VOP proceeds co-visible image sections by obtaining patch-level embeddings using a Vision Transformer backbone.<n>Our approach uses a voting mechanism to assess overlap scores for potential database images.
arXiv Detail & Related papers (2024-06-23T20:00:20Z)
FAR: Flexible, Accurate and Robust 6DoF Relative Camera Pose Estimation [30.710296843150832]
Estimating relative camera poses between images has been a central problem in computer vision. We show how to combine the best of both methods; our approach yields results that are both precise and robust. A comprehensive analysis supports our design choices and demonstrates that our method adapts flexibly to various feature extractors and correspondence estimators.
arXiv Detail & Related papers (2024-03-05T18:59:51Z)
Layered Rendering Diffusion Model for Controllable Zero-Shot Image Synthesis [15.76266032768078]
This paper introduces innovative solutions to enhance spatial controllability in diffusion models reliant on text queries.<n>We first introduce vision guidance as a foundational spatial cue within the perturbed distribution.<n>We propose a universal framework, Layered Rendering Diffusion (LRDiff), which constructs an image-rendering process with multiple layers.
arXiv Detail & Related papers (2023-11-30T10:36:19Z)
Patch-Wise Self-Supervised Visual Representation Learning: A Fine-Grained Approach [4.9204263448542465]
This study introduces an innovative, fine-grained dimension by integrating patch-level discrimination into self-supervised visual representation learning. We employ a distinctive photometric patch-level augmentation, where each patch is individually augmented, independent from other patches within the same view. We present a simple yet effective patch-matching algorithm to find the corresponding patches across the augmented views.
arXiv Detail & Related papers (2023-10-28T09:35:30Z)
View Consistent Purification for Accurate Cross-View Localization [59.48131378244399]
This paper proposes a fine-grained self-localization method for outdoor robotics. The proposed method addresses limitations in existing cross-view localization methods. It is the first sparse visual-only method that enhances perception in dynamic environments.
arXiv Detail & Related papers (2023-08-16T02:51:52Z)
Semantic keypoint-based pose estimation from single RGB frames [64.80395521735463]
We present an approach to estimating the continuous 6-DoF pose of an object from a single RGB image. The approach combines semantic keypoints predicted by a convolutional network (convnet) with a deformable shape model. We show that our approach can accurately recover the 6-DoF object pose for both instance- and class-based scenarios.
arXiv Detail & Related papers (2022-04-12T15:03:51Z)
A Model for Multi-View Residual Covariances based on Perspective Deformation [88.21738020902411]
We derive a model for the covariance of the visual residuals in multi-view SfM, odometry and SLAM setups. We validate our model with synthetic and real data and integrate it into photometric and feature-based Bundle Adjustment.
arXiv Detail & Related papers (2022-02-01T21:21:56Z)
Perceptual Loss for Robust Unsupervised Homography Estimation [1.2891210250935146]
BiHomE minimizes the distance in the feature space between the warped image from the source viewpoint and the corresponding image from the target viewpoint. We show that biHomE achieves state-of-the-art performance on synthetic COCO dataset, which is also comparable or better compared to supervised approaches.
arXiv Detail & Related papers (2021-04-20T14:41:54Z)
LM-Reloc: Levenberg-Marquardt Based Direct Visual Relocalization [54.77498358487812]
LM-Reloc is a novel approach for visual relocalization based on direct image alignment. We propose a loss formulation inspired by the classical Levenberg-Marquardt algorithm to train LM-Net.
arXiv Detail & Related papers (2020-10-13T12:15:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.