Related papers: CRISP: Object Pose and Shape Estimation with Test-Time Adaptation

CRISP: Object Pose and Shape Estimation with Test-Time Adaptation

URL: http://arxiv.org/abs/2412.01052v1
Date: Mon, 02 Dec 2024 02:26:21 GMT
Title: CRISP: Object Pose and Shape Estimation with Test-Time Adaptation
Authors: Jingnan Shi, Rajat Talak, Harry Zhang, David Jin, Luca Carlone,
Abstract summary: We consider the problem of estimating object pose and shape from an RGB-D image.<n>We introduce CRISP, a category-agnostic object pose and shape estimation pipeline.<n>We also propose an optimization-based pose and shape corrector that can correct estimation errors caused by a domain gap.
Score: 21.51021467386653
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We consider the problem of estimating object pose and shape from an RGB-D image. Our first contribution is to introduce CRISP, a category-agnostic object pose and shape estimation pipeline. The pipeline implements an encoder-decoder model for shape estimation. It uses FiLM-conditioning for implicit shape reconstruction and a DPT-based network for estimating pose-normalized points for pose estimation. As a second contribution, we propose an optimization-based pose and shape corrector that can correct estimation errors caused by a domain gap. Observing that the shape decoder is well behaved in the convex hull of known shapes, we approximate the shape decoder with an active shape model, and show that this reduces the shape correction problem to a constrained linear least squares problem, which can be solved efficiently by an interior point algorithm. Third, we introduce a self-training pipeline to perform self-supervised domain adaptation of CRISP. The self-training is based on a correct-and-certify approach, which leverages the corrector to generate pseudo-labels at test time, and uses them to self-train CRISP. We demonstrate CRISP (and the self-training) on YCBV, SPE3R, and NOCS datasets. CRISP shows high performance on all the datasets. Moreover, our self-training is capable of bridging a large domain gap. Finally, CRISP also shows an ability to generalize to unseen objects. Code and pre-trained models will be available on https://web.mit.edu/sparklab/research/crisp_object_pose_shape/.

Related papers

Efficient Autoregressive Shape Generation via Octree-Based Adaptive Tokenization [68.07464514094299]
Existing methods encode all shapes into a fixed-size token, disregarding the inherent variations in scale and complexity across 3D data. We introduce Octree-based Adaptive Tokenization, a novel framework that adjusts the dimension of latent representations according to shape complexity. Our approach reduces token counts by 50% compared to fixed-size methods while maintaining comparable visual quality.
arXiv Detail & Related papers (2025-04-03T17:57:52Z)
RecRecNet: Rectangling Rectified Wide-Angle Images by Thin-Plate Spline Model and DoF-based Curriculum Learning [62.86400614141706]
We propose a new learning model, i.e., Rectangling Rectification Network (RecRecNet) Our model can flexibly warp the source structure to the target domain and achieves an end-to-end unsupervised deformation. Experiments show the superiority of our solution over the compared methods on both quantitative and qualitative evaluations.
arXiv Detail & Related papers (2023-01-04T15:12:57Z)
Generative Category-Level Shape and Pose Estimation with Semantic Primitives [27.692997522812615]
We propose a novel framework for category-level object shape and pose estimation from a single RGB-D image. To handle the intra-category variation, we adopt a semantic primitive representation that encodes diverse shapes into a unified latent space. We show that the proposed method achieves SOTA pose estimation performance and better generalization in the real-world dataset.
arXiv Detail & Related papers (2022-10-03T17:51:54Z)
Semantic keypoint-based pose estimation from single RGB frames [64.80395521735463]
We present an approach to estimating the continuous 6-DoF pose of an object from a single RGB image. The approach combines semantic keypoints predicted by a convolutional network (convnet) with a deformable shape model. We show that our approach can accurately recover the 6-DoF object pose for both instance- and class-based scenarios.
arXiv Detail & Related papers (2022-04-12T15:03:51Z)
RNNPose: Recurrent 6-DoF Object Pose Refinement with Robust Correspondence Field Estimation and Pose Optimization [46.144194562841435]
We propose a framework based on a recurrent neural network (RNN) for object pose refinement. The problem is formulated as a non-linear least squares problem based on the estimated correspondence field. The correspondence field estimation and pose refinement are conducted alternatively in each iteration to recover accurate object poses.
arXiv Detail & Related papers (2022-03-24T06:24:55Z)
Optimal Target Shape for LiDAR Pose Estimation [1.9048510647598205]
Targets are essential in problems such as object tracking in cluttered or textureless environments. symmetric shapes lead to pose ambiguity when using sparse sensor data. This paper introduces the concept of optimizing target shape to remove pose ambiguity for LiDAR point clouds.
arXiv Detail & Related papers (2021-09-02T19:18:24Z)
Optimal Pose and Shape Estimation for Category-level 3D Object Perception [24.232254155643574]
category-level perception problem, where one is given 3D sensor data picturing an object of a given category. We provide the first certifiably optimal CAD solver for pose and shape estimation. We also develop the first graph-theoretic formulation to prune outliers in category-level perception.
arXiv Detail & Related papers (2021-04-16T21:41:29Z)
Adversarial Shape Learning for Building Extraction in VHR Remote Sensing Images [18.650642666164252]
We propose an adversarial shape learning network (ASLNet) to model the building shape patterns. Experiments show that the proposed ASLNet improves both the pixel-based accuracy and the object-based measurements by a large margin.
arXiv Detail & Related papers (2021-02-22T18:49:43Z)
From Points to Multi-Object 3D Reconstruction [71.17445805257196]
We propose a method to detect and reconstruct multiple 3D objects from a single RGB image. A keypoint detector localizes objects as center points and directly predicts all object properties, including 9-DoF bounding boxes and 3D shapes. The presented approach performs lightweight reconstruction in a single-stage, it is real-time capable, fully differentiable and end-to-end trainable.
arXiv Detail & Related papers (2020-12-21T18:52:21Z)
Shape Prior Deformation for Categorical 6D Object Pose and Size Estimation [62.618227434286]
We present a novel learning approach to recover the 6D poses and sizes of unseen object instances from an RGB-D image. We propose a deep network to reconstruct the 3D object model by explicitly modeling the deformation from a pre-learned categorical shape prior.
arXiv Detail & Related papers (2020-07-16T16:45:05Z)
Dense Non-Rigid Structure from Motion: A Manifold Viewpoint [162.88686222340962]
Non-Rigid Structure-from-Motion (NRSfM) problem aims to recover 3D geometry of a deforming object from its 2D feature correspondences across multiple frames. We show that our approach significantly improves accuracy, scalability, and robustness against noise.
arXiv Detail & Related papers (2020-06-15T09:15:54Z)
Point2Mesh: A Self-Prior for Deformable Meshes [83.31236364265403]
We introduce Point2Mesh, a technique for reconstructing a surface mesh from an input point cloud. The self-prior encapsulates reoccurring geometric repetitions from a single shape within the weights of a deep neural network. We show that Point2Mesh converges to a desirable solution; compared to a prescribed smoothness prior, which often becomes trapped in undesirable local minima.
arXiv Detail & Related papers (2020-05-22T10:01:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.