Related papers: Learning Positive-Incentive Point Sampling in Neural Implicit Fields for Object Pose Estimation

Learning Positive-Incentive Point Sampling in Neural Implicit Fields for Object Pose Estimation

URL: http://arxiv.org/abs/2602.19937v1
Date: Mon, 23 Feb 2026 15:10:00 GMT
Title: Learning Positive-Incentive Point Sampling in Neural Implicit Fields for Object Pose Estimation
Authors: Yifei Shi, Boyan Wan, Xin Xu, Kai Xu,
Abstract summary: Learning neural implicit fields of 3D shapes is a rapidly emerging field that enables shape representation at arbitrary resolutions.<n>We propose a method combining an SO(3)-equivariant convolutional implicit network and a positive-incentive point sampling (PIPS) strategy.<n>Our method outperforms the state-of-the-art on three pose estimation datasets.
Score: 17.52581598592474
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Learning neural implicit fields of 3D shapes is a rapidly emerging field that enables shape representation at arbitrary resolutions. Due to the flexibility, neural implicit fields have succeeded in many research areas, including shape reconstruction, novel view image synthesis, and more recently, object pose estimation. Neural implicit fields enable learning dense correspondences between the camera space and the object's canonical space-including unobserved regions in camera space-significantly boosting object pose estimation performance in challenging scenarios like highly occluded objects and novel shapes. Despite progress, predicting canonical coordinates for unobserved camera-space regions remains challenging due to the lack of direct observational signals. This necessitates heavy reliance on the model's generalization ability, resulting in high uncertainty. Consequently, densely sampling points across the entire camera space may yield inaccurate estimations that hinder the learning process and compromise performance. To alleviate this problem, we propose a method combining an SO(3)-equivariant convolutional implicit network and a positive-incentive point sampling (PIPS) strategy. The SO(3)-equivariant convolutional implicit network estimates point-level attributes with SO(3)-equivariance at arbitrary query locations, demonstrating superior performance compared to most existing baselines. The PIPS strategy dynamically determines sampling locations based on the input, thereby boosting the network's accuracy and training efficiency. Our method outperforms the state-of-the-art on three pose estimation datasets. Notably, it demonstrates significant improvements in challenging scenarios, such as objects captured with unseen pose, high occlusion, novel geometry, and severe noise.

Related papers

Topology-Aware Modeling for Unsupervised Simulation-to-Reality Point Cloud Recognition [63.55828203989405]
We introduce a novel Topology-Aware Modeling (TAM) framework for Sim2Real UDA on object point clouds.<n>Our approach mitigates the domain gap by leveraging global spatial topology, characterized by low-level, high-frequency 3D structures.<n>We propose an advanced self-training strategy that combines cross-domain contrastive learning with self-training.
arXiv Detail & Related papers (2025-06-26T11:53:59Z)
3D Equivariant Pose Regression via Direct Wigner-D Harmonics Prediction [50.07071392673984]
Existing methods learn 3D rotations parametrized in the spatial domain using angles or quaternions. We propose a frequency-domain approach that directly predicts Wigner-D coefficients for 3D rotation regression. Our method achieves state-of-the-art results on benchmarks such as ModelNet10-SO(3) and PASCAL3D+.
arXiv Detail & Related papers (2024-11-01T12:50:38Z)
SCIPaD: Incorporating Spatial Clues into Unsupervised Pose-Depth Joint Learning [17.99904937160487]
We introduce SCIPaD, a novel approach that incorporates spatial clues for unsupervised depth-pose joint learning. SCIPaD achieves a reduction of 22.2% in average translation error and 34.8% in average angular error for camera pose estimation task on the KITTI Odometry dataset.
arXiv Detail & Related papers (2024-07-07T06:52:51Z)
Unsupervised Occupancy Learning from Sparse Point Cloud [8.732260277121547]
Implicit Neural Representations have gained prominence as a powerful framework for capturing complex data modalities. In this paper, we propose a method to infer occupancy fields instead of Neural Signed Distance Functions. We highlight its capacity to improve implicit shape inference with respect to baselines and the state-of-the-art using synthetic and real data.
arXiv Detail & Related papers (2024-04-03T14:05:39Z)
IPoD: Implicit Field Learning with Point Diffusion for Generalizable 3D Object Reconstruction from Single RGB-D Images [50.4538089115248]
Generalizable 3D object reconstruction from single-view RGB-D images remains a challenging task. We propose a novel approach, IPoD, which harmonizes implicit field learning with point diffusion. Experiments conducted on the CO3D-v2 dataset affirm the superiority of IPoD, achieving 7.8% improvement in F-score and 28.6% in Chamfer distance over existing methods.
arXiv Detail & Related papers (2024-03-30T07:17:37Z)
Source-Free and Image-Only Unsupervised Domain Adaptation for Category Level Object Pose Estimation [18.011044932979143]
3DUDA is a method capable of adapting to a nuisance-ridden target domain without 3D or depth data. We represent object categories as simple cuboid meshes, and harness a generative model of neural feature activations. We show that our method simulates fine-tuning on a global pseudo-labeled dataset under mild assumptions.
arXiv Detail & Related papers (2024-01-19T17:48:05Z)
CheckerPose: Progressive Dense Keypoint Localization for Object Pose Estimation with Graph Neural Network [66.24726878647543]
Estimating the 6-DoF pose of a rigid object from a single RGB image is a crucial yet challenging task. Recent studies have shown the great potential of dense correspondence-based solutions. We propose a novel pose estimation algorithm named CheckerPose, which improves on three main aspects.
arXiv Detail & Related papers (2023-03-29T17:30:53Z)
Generative Category-Level Shape and Pose Estimation with Semantic Primitives [27.692997522812615]
We propose a novel framework for category-level object shape and pose estimation from a single RGB-D image. To handle the intra-category variation, we adopt a semantic primitive representation that encodes diverse shapes into a unified latent space. We show that the proposed method achieves SOTA pose estimation performance and better generalization in the real-world dataset.
arXiv Detail & Related papers (2022-10-03T17:51:54Z)
Semi-signed neural fitting for surface reconstruction from unoriented point clouds [53.379712818791894]
We propose SSN-Fitting to reconstruct a better signed distance field. SSN-Fitting consists of a semi-signed supervision and a loss-based region sampling strategy. We conduct experiments to demonstrate that SSN-Fitting achieves state-of-the-art performance under different settings.
arXiv Detail & Related papers (2022-06-14T09:40:17Z)
ImpDet: Exploring Implicit Fields for 3D Object Detection [74.63774221984725]
We introduce a new perspective that views bounding box regression as an implicit function. This leads to our proposed framework, termed Implicit Detection or ImpDet. Our ImpDet assigns specific values to points in different local 3D spaces, thereby high-quality boundaries can be generated.
arXiv Detail & Related papers (2022-03-31T17:52:12Z)
RNNPose: Recurrent 6-DoF Object Pose Refinement with Robust Correspondence Field Estimation and Pose Optimization [46.144194562841435]
We propose a framework based on a recurrent neural network (RNN) for object pose refinement. The problem is formulated as a non-linear least squares problem based on the estimated correspondence field. The correspondence field estimation and pose refinement are conducted alternatively in each iteration to recover accurate object poses.
arXiv Detail & Related papers (2022-03-24T06:24:55Z)
Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image. Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space. We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step. This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.