Related papers: An Efficient Deep Template Matching and In-Plane Pose Estimation Method via Template-Aware Dynamic Convolution

An Efficient Deep Template Matching and In-Plane Pose Estimation Method via Template-Aware Dynamic Convolution

URL: http://arxiv.org/abs/2510.01678v1
Date: Thu, 02 Oct 2025 05:05:43 GMT
Title: An Efficient Deep Template Matching and In-Plane Pose Estimation Method via Template-Aware Dynamic Convolution
Authors: Ke Jia, Ji Zhou, Hanxin Li, Zhigan Zhou, Haojie Chu, Xiaojie Li,
Abstract summary: In industrial inspection and component alignment tasks, template matching requires efficient estimation of a target's position and geometric state.<n>We propose a lightweight end-to-end framework that reformulates template matching as joint localization and geometric regression.<n>Experiments show our 3.07M model achieves high precision and 14ms inference under compound transformations.
Score: 5.201850165450502
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: In industrial inspection and component alignment tasks, template matching requires efficient estimation of a target's position and geometric state (rotation and scaling) under complex backgrounds to support precise downstream operations. Traditional methods rely on exhaustive enumeration of angles and scales, leading to low efficiency under compound transformations. Meanwhile, most deep learning-based approaches only estimate similarity scores without explicitly modeling geometric pose, making them inadequate for real-world deployment. To overcome these limitations, we propose a lightweight end-to-end framework that reformulates template matching as joint localization and geometric regression, outputting the center coordinates, rotation angle, and independent horizontal and vertical scales. A Template-Aware Dynamic Convolution Module (TDCM) dynamically injects template features at inference to guide generalizable matching. The compact network integrates depthwise separable convolutions and pixel shuffle for efficient matching. To enable geometric-annotation-free training, we introduce a rotation-shear-based augmentation strategy with structure-aware pseudo labels. A lightweight refinement module further improves angle and scale precision via local optimization. Experiments show our 3.07M model achieves high precision and 14ms inference under compound transformations. It also demonstrates strong robustness in small-template and multi-object scenarios, making it highly suitable for deployment in real-time industrial applications. The code is available at:https://github.com/ZhouJ6610/PoseMatch-TDCM.

Related papers

Tail-Aware Post-Training Quantization for 3D Geometry Models [58.79500829118265]
Post-Training Quantization (PTQ) enables efficient inference without retraining.<n>PTQ fails to transfer effectively to 3D models due to intricate feature distributions and prohibitive calibration overhead.<n>We propose TAPTQ, a Tail-Aware Post-Training Quantization pipeline for 3D geometric learning.
arXiv Detail & Related papers (2026-02-02T07:21:15Z)
Robust Subpixel Localization of Diagonal Markers in Large-Scale Navigation via Multi-Layer Screening and Adaptive Matching [18.710429100680006]
This paper proposes a robust, high-precision positioning methodology to address localization failures in large-scale flight navigation.<n>The proposed methodology employs a three-tiered framework incorporating multi-layer corner screening and adaptive template matching.<n> Experimental results demonstrate the method's effectiveness in extracting and localizing diagonal markers in complex, large-scale environments.
arXiv Detail & Related papers (2026-01-13T02:51:31Z)
SCas4D: Structural Cascaded Optimization for Boosting Persistent 4D Novel View Synthesis [53.10680153186481]
We propose SCas4D, a cascaded optimization framework that leverages structural patterns in 3D Gaussian Splatting for dynamic scenes.<n>By progressively refining deformations from coarse part-level to fine point-level, SCas4D achieves convergence within 100 iterations per time frame.<n>The approach also demonstrates effectiveness in self-supervised articulated object segmentation, novel view synthesis, and dense point tracking tasks.
arXiv Detail & Related papers (2025-10-08T06:39:33Z)
Non-Rigid Structure-from-Motion via Differential Geometry with Recoverable Conformal Scale [17.935227965480475]
We introduce a novel method, called Con-NRSfM, for NRSfM under conformal deformations.<n>Our approach performs point-wise reconstruction using 2D selected image warps optimized through a graph-based framework.<n>Our framework decouples constraints on depth and conformal scale, which are inseparable in other approaches.
arXiv Detail & Related papers (2025-10-02T04:46:46Z)
H3R: Hybrid Multi-view Correspondence for Generalizable 3D Reconstruction [39.22287224290769]
H3R is a hybrid framework that integrates latent fusion with attention-based feature aggregation.<n>By integrating both paradigms, our approach enhances generalization while converging 2$times$ faster than existing methods.<n>Our method supports variable-number and high-resolution input views while demonstrating robust cross-dataset generalization.
arXiv Detail & Related papers (2025-08-05T05:56:30Z)
3D Geometric Shape Assembly via Efficient Point Cloud Matching [59.241448711254485]
We introduce Proxy Match Transform (PMT), an approximate high-order feature transform layer that enables reliable matching between mating surfaces of parts. Building upon PMT, we introduce a new framework, dubbed Proxy Match TransformeR (PMTR), for the geometric assembly task. We evaluate the proposed PMTR on the large-scale 3D geometric shape assembly benchmark dataset of Breaking Bad.
arXiv Detail & Related papers (2024-07-15T08:50:02Z)
Multi-Level Aggregation and Recursive Alignment Architecture for Efficient Parallel Inference Segmentation Network [18.47001817385548]
We propose a parallel inference network customized for semantic segmentation tasks. We employ a shallow backbone to ensure real-time speed, and propose three core components to compensate for the reduced model capacity to improve accuracy. Our framework shows a better balance between speed and accuracy than state-of-the-art real-time methods on Cityscapes and CamVid datasets.
arXiv Detail & Related papers (2024-02-03T22:51:17Z)
SIGMA: Scale-Invariant Global Sparse Shape Matching [50.385414715675076]
We propose a novel mixed-integer programming (MIP) formulation for generating precise sparse correspondences for non-rigid shapes. We show state-of-the-art results for sparse non-rigid matching on several challenging 3D datasets.
arXiv Detail & Related papers (2023-08-16T14:25:30Z)
Adaptive Spot-Guided Transformer for Consistent Local Feature Matching [64.30749838423922]
We propose Adaptive Spot-Guided Transformer (ASTR) for local feature matching. ASTR models the local consistency and scale variations in a unified coarse-to-fine architecture.
arXiv Detail & Related papers (2023-03-29T12:28:01Z)
DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets [95.84755169585492]
We present Dynamic Sparse Voxel Transformer (DSVT), a single-stride window-based voxel Transformer backbone for outdoor 3D perception. Our model achieves state-of-the-art performance with a broad range of 3D perception tasks.
arXiv Detail & Related papers (2023-01-15T09:31:58Z)
Dynamic Convolution for 3D Point Cloud Instance Segmentation [146.7971476424351]
We propose an approach to instance segmentation from 3D point clouds based on dynamic convolution. We gather homogeneous points that have identical semantic categories and close votes for the geometric centroids. The proposed approach is proposal-free, and instead exploits a convolution process that adapts to the spatial and semantic characteristics of each instance.
arXiv Detail & Related papers (2021-07-18T09:05:16Z)
Neural Subdivision [58.97214948753937]
This paper introduces Neural Subdivision, a novel framework for data-driven coarseto-fine geometry modeling. We optimize for the same set of network weights across all local mesh patches, thus providing an architecture that is not constrained to a specific input mesh, fixed genus, or category. We demonstrate that even when trained on a single high-resolution mesh our method generates reasonable subdivisions for novel shapes.
arXiv Detail & Related papers (2020-05-04T20:03:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.