Related papers: Semi-automatic 3D Object Keypoint Annotation and Detection for the Masses

Semi-automatic 3D Object Keypoint Annotation and Detection for the Masses

URL: http://arxiv.org/abs/2201.07665v1
Date: Wed, 19 Jan 2022 15:41:54 GMT
Title: Semi-automatic 3D Object Keypoint Annotation and Detection for the Masses
Authors: Kenneth Blomqvist, Jen Jen Chung, Lionel Ott, Roland Siegwart
Abstract summary: We present a semi-automatic way of collecting and labeling datasets using a wrist mounted camera on a standard robotic arm. We are able to obtain a working 3D object keypoint detector and go through the whole process of data collection, annotation and learning in just a couple hours of active time.
Score: 42.34064154798376
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Creating computer vision datasets requires careful planning and lots of time and effort. In robotics research, we often have to use standardized objects, such as the YCB object set, for tasks such as object tracking, pose estimation, grasping and manipulation, as there are datasets and pre-learned methods available for these objects. This limits the impact of our research since learning-based computer vision methods can only be used in scenarios that are supported by existing datasets. In this work, we present a full object keypoint tracking toolkit, encompassing the entire process from data collection, labeling, model learning and evaluation. We present a semi-automatic way of collecting and labeling datasets using a wrist mounted camera on a standard robotic arm. Using our toolkit and method, we are able to obtain a working 3D object keypoint detector and go through the whole process of data collection, annotation and learning in just a couple hours of active time.

Related papers

MonoSOWA: Scalable monocular 3D Object detector Without human Annotations [0.0]
We present a novel method to train a 3D object detector from a single RGB camera without domain-specific human annotations. The method is evaluated on three public datasets, where despite using no human labels, it outperforms prior work by a significant margin.
arXiv Detail & Related papers (2025-01-16T11:35:22Z)
Articulate3D: Holistic Understanding of 3D Scenes as Universal Scene Description [56.69740649781989]
3D scene understanding is a long-standing challenge in computer vision and a key component in enabling mixed reality, wearable computing, and embodied AI.<n>We introduce Articulate3D, an expertly curated 3D dataset featuring high-quality manual annotations on 280 indoor scenes.<n>We also present USDNet, a novel unified framework capable of simultaneously predicting part segmentation along with a full specification of motion attributes for articulated objects.
arXiv Detail & Related papers (2024-12-02T11:33:55Z)
Category-level Object Detection, Pose Estimation and Reconstruction from Stereo Images [15.921719523588996]
Existing monocular and RGB-D methods suffer from scale ambiguity due to missing or depth measurements. We present CODERS, a one-stage approach for Category-level Object Detection, pose Estimation and Reconstruction from Stereo images. Our dataset, code, and demos will be available on our project page.
arXiv Detail & Related papers (2024-07-09T15:59:03Z)
SeMoLi: What Moves Together Belongs Together [51.72754014130369]
We tackle semi-supervised object detection based on motion cues. Recent results suggest that motion-based clustering methods can be used to pseudo-label instances of moving objects. We re-think this approach and suggest that both, object detection, as well as motion-inspired pseudo-labeling, can be tackled in a data-driven manner.
arXiv Detail & Related papers (2024-02-29T18:54:53Z)
ADL4D: Towards A Contextually Rich Dataset for 4D Activities of Daily Living [4.221961702292134]
ADL4D is a dataset of up to two subjects inter- acting with different sets of objects performing Activities of Daily Living (ADL) Our dataset consists of 75 sequences with a total of 1.1M RGB-D frames, hand and object poses, and per-hand fine-grained action annotations. We develop an automatic system for multi-view multi-hand 3D pose an- notation capable of tracking hand poses over time.
arXiv Detail & Related papers (2024-02-27T18:51:52Z)
Multi-Modal Dataset Acquisition for Photometrically Challenging Object [56.30027922063559]
This paper addresses the limitations of current datasets for 3D vision tasks in terms of accuracy, size, realism, and suitable imaging modalities for photometrically challenging objects. We propose a novel annotation and acquisition pipeline that enhances existing 3D perception and 6D object pose datasets.
arXiv Detail & Related papers (2023-08-21T10:38:32Z)
PatchContrast: Self-Supervised Pre-training for 3D Object Detection [14.603858163158625]
We introduce PatchContrast, a novel self-supervised point cloud pre-training framework for 3D object detection. We show that our method outperforms existing state-of-the-art models on three commonly-used 3D detection datasets.
arXiv Detail & Related papers (2023-08-14T07:45:54Z)
HANDAL: A Dataset of Real-World Manipulable Object Categories with Pose Annotations, Affordances, and Reconstructions [17.9178233068395]
We present the HANDAL dataset for category-level object pose estimation and affordance prediction. The dataset consists of 308k annotated image frames from 2.2k videos of 212 real-world objects in 17 categories. We outline the usefulness of our dataset for 6-DoF category-level pose+scale estimation and related tasks.
arXiv Detail & Related papers (2023-08-02T23:59:59Z)
Lifelong Ensemble Learning based on Multiple Representations for Few-Shot Object Recognition [6.282068591820947]
We present a lifelong ensemble learning approach based on multiple representations to address the few-shot object recognition problem. To facilitate lifelong learning, each approach is equipped with a memory unit for storing and retrieving object information instantly. We have performed extensive sets of experiments to assess the performance of the proposed approach in offline, and open-ended scenarios.
arXiv Detail & Related papers (2022-05-04T10:29:10Z)
Towards unconstrained joint hand-object reconstruction from RGB videos [81.97694449736414]
Reconstructing hand-object manipulations holds a great potential for robotics and learning from human demonstrations. We first propose a learning-free fitting approach for hand-object reconstruction which can seamlessly handle two-hand object interactions.
arXiv Detail & Related papers (2021-08-16T12:26:34Z)
REGRAD: A Large-Scale Relational Grasp Dataset for Safe and Object-Specific Robotic Grasping in Clutter [52.117388513480435]
We present a new dataset named regrad to sustain the modeling of relationships among objects and grasps. Our dataset is collected in both forms of 2D images and 3D point clouds. Users are free to import their own object models for the generation of as many data as they want.
arXiv Detail & Related papers (2021-04-29T05:31:21Z)
Learnable Online Graph Representations for 3D Multi-Object Tracking [156.58876381318402]
We propose a unified and learning based approach to the 3D MOT problem. We employ a Neural Message Passing network for data association that is fully trainable. We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches.
arXiv Detail & Related papers (2021-04-23T17:59:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.