Towards Real-World Category-level Articulation Pose Estimation
- URL: http://arxiv.org/abs/2105.03260v1
- Date: Fri, 7 May 2021 13:41:16 GMT
- Title: Towards Real-World Category-level Articulation Pose Estimation
- Authors: Liu Liu, Han Xue, Wenqiang Xu, Haoyuan Fu, Cewu Lu
- Abstract summary: Category-level Articulation Pose Estimation (CAPE) methods are studied under the single-instance setting with a fixed kinematic structure for each category.
Considering these limitations, we reform this problem setting for real-world environments and suggest a CAPE-Real (CAPER) task setting.
This setting allows varied kinematic structures within a semantic category, and multiple instances to co-exist in an observation of real world.
- Score: 46.813224754603866
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human life is populated with articulated objects. Current Category-level
Articulation Pose Estimation (CAPE) methods are studied under the
single-instance setting with a fixed kinematic structure for each category.
Considering these limitations, we reform this problem setting for real-world
environments and suggest a CAPE-Real (CAPER) task setting. This setting allows
varied kinematic structures within a semantic category, and multiple instances
to co-exist in an observation of real world. To support this task, we build an
articulated model repository ReArt-48 and present an efficient dataset
generation pipeline, which contains Fast Articulated Object Modeling (FAOM) and
Semi-Authentic MixEd Reality Technique (SAMERT). Accompanying the pipeline, we
build a large-scale mixed reality dataset ReArtMix and a real world dataset
ReArtVal. We also propose an effective framework ReArtNOCS that exploits RGB-D
input to estimate part-level pose for multiple instances in a single forward
pass. Extensive experiments demonstrate that the proposed ReArtNOCS can achieve
good performance on both CAPER and CAPE settings. We believe it could serve as
a strong baseline for future research on the CAPER task.
Related papers
- Zero-Shot Object-Centric Representation Learning [72.43369950684057]
We study current object-centric methods through the lens of zero-shot generalization.
We introduce a benchmark comprising eight different synthetic and real-world datasets.
We find that training on diverse real-world images improves transferability to unseen scenarios.
arXiv Detail & Related papers (2024-08-17T10:37:07Z) - NeuSurfEmb: A Complete Pipeline for Dense Correspondence-based 6D Object Pose Estimation without CAD Models [34.898217885820614]
We present a pipeline that does not require CAD models and allows training a state-of-the-art pose estimator requiring only a small set of real images as input.
Our method is based on a NeuS2 object representation, that we learn through a semi-automated procedure based on Structure-from-Motion (SfM) and object-agnostic segmentation.
We evaluate our method on the LINEMOD-Occlusion dataset, extensively studying the impact of its individual components and showing competitive performance with respect to approaches based on CAD models and PBR data.
arXiv Detail & Related papers (2024-07-16T22:48:22Z) - FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects [55.77542145604758]
FoundationPose is a unified foundation model for 6D object pose estimation and tracking.
Our approach can be instantly applied at test-time to a novel object without fine-tuning.
arXiv Detail & Related papers (2023-12-13T18:28:09Z) - CoDEPS: Online Continual Learning for Depth Estimation and Panoptic
Segmentation [28.782231314289174]
We introduce continual learning for deep learning-based monocular depth estimation and panoptic segmentation in new environments in an online manner.
We propose a novel domain-mixing strategy to generate pseudo-labels to adapt panoptic segmentation.
We explicitly address the limited storage capacity of robotic systems by leveraging sampling strategies for constructing a fixed-size replay buffer.
arXiv Detail & Related papers (2023-03-17T17:31:55Z) - Bridging the Gap to Real-World Object-Centric Learning [66.55867830853803]
We show that reconstructing features from models trained in a self-supervised manner is a sufficient training signal for object-centric representations to arise in a fully unsupervised way.
Our approach, DINOSAUR, significantly out-performs existing object-centric learning models on simulated data.
arXiv Detail & Related papers (2022-09-29T15:24:47Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - Unseen Object Instance Segmentation with Fully Test-time RGB-D
Embeddings Adaptation [14.258456366985444]
Recently, a popular solution is leveraging RGB-D features of large-scale synthetic data and applying the model to unseen real-world scenarios.
We re-emphasize the adaptation process across Sim2Real domains in this paper.
We propose a framework to conduct the Fully Test-time RGB-D Embeddings Adaptation (FTEA) based on parameters of the BatchNorm layer.
arXiv Detail & Related papers (2022-04-21T02:35:20Z) - Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets.
This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets.
We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.