Self-supervised Learning of 3D Object Understanding by Data Association
and Landmark Estimation for Image Sequence
- URL: http://arxiv.org/abs/2104.07077v1
- Date: Wed, 14 Apr 2021 18:59:08 GMT
- Title: Self-supervised Learning of 3D Object Understanding by Data Association
and Landmark Estimation for Image Sequence
- Authors: Hyeonwoo Yu and Jean Oh
- Abstract summary: 3D object under-standing from 2D image is a challenging task that infers ad-ditional dimension from reduced-dimensional information.
It is challenging to obtain large amount of 3D dataset since achieving 3D annotation is expensive andtime-consuming.
We propose a strategy to exploit multipleobservations of the object in the image sequence in orderto surpass the self-performance.
- Score: 15.815583594196488
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we propose a self-supervised learningmethod for multi-object
pose estimation. 3D object under-standing from 2D image is a challenging task
that infers ad-ditional dimension from reduced-dimensional information.In
particular, the estimation of the 3D localization or orien-tation of an object
requires precise reasoning, unlike othersimple clustering tasks such as object
classification. There-fore, the scale of the training dataset becomes more
cru-cial. However, it is challenging to obtain large amount of3D dataset since
achieving 3D annotation is expensive andtime-consuming. If the scale of the
training dataset can beincreased by involving the image sequence obtained
fromsimple navigation, it is possible to overcome the scale lim-itation of the
dataset and to have efficient adaptation tothe new environment. However, when
the self annotation isconducted on single image by the network itself,
trainingperformance of the network is bounded to the self perfor-mance.
Therefore, we propose a strategy to exploit multipleobservations of the object
in the image sequence in orderto surpass the self-performance: first, the
landmarks for theglobal object map are estimated through network predic-tion
and data association, and the corrected annotation fora single frame is
obtained. Then, network fine-tuning is con-ducted including the dataset
obtained by self-annotation,thereby exceeding the performance boundary of the
networkitself. The proposed method was evaluated on the KITTIdriving scene
dataset, and we demonstrate the performanceimprovement in the pose estimation
of multi-object in 3D space.
Related papers
- Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns.
A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z) - 3D Adversarial Augmentations for Robust Out-of-Domain Predictions [115.74319739738571]
We focus on improving the generalization to out-of-domain data.
We learn a set of vectors that deform the objects in an adversarial fashion.
We perform adversarial augmentation by applying the learned sample-independent vectors to the available objects when training a model.
arXiv Detail & Related papers (2023-08-29T17:58:55Z) - Semi-Weakly Supervised Object Kinematic Motion Prediction [56.282759127180306]
Given a 3D object, kinematic motion prediction aims to identify the mobile parts as well as the corresponding motion parameters.
We propose a graph neural network to learn the map between hierarchical part-level segmentation and mobile parts parameters.
The network predictions yield a large scale of 3D objects with pseudo labeled mobility information.
arXiv Detail & Related papers (2023-03-31T02:37:36Z) - Simultaneous Multiple Object Detection and Pose Estimation using 3D
Model Infusion with Monocular Vision [21.710141497071373]
Multiple object detection and pose estimation are vital computer vision tasks.
We propose simultaneous neural modeling of both using monocular vision and 3D model infusion.
Our Simultaneous Multiple Object detection and Pose Estimation network (SMOPE-Net) is an end-to-end trainable multitasking network.
arXiv Detail & Related papers (2022-11-21T05:18:56Z) - S$^2$Contact: Graph-based Network for 3D Hand-Object Contact Estimation
with Semi-Supervised Learning [70.72037296392642]
We propose a novel semi-supervised framework that allows us to learn contact from monocular images.
Specifically, we leverage visual and geometric consistency constraints in large-scale datasets for generating pseudo-labels.
We show benefits from using a contact map that rules hand-object interactions to produce more accurate reconstructions.
arXiv Detail & Related papers (2022-08-01T14:05:23Z) - Primitive3D: 3D Object Dataset Synthesis from Randomly Assembled
Primitives [44.03149443379618]
We propose a cost-effective method for automatically generating a large amount of 3D objects with annotations.
These objects are auto-annotated with part labels originating from primitives.
Considering the large overhead of learning on the generated dataset, we propose a dataset distillation strategy.
arXiv Detail & Related papers (2022-05-25T10:07:07Z) - Self-Supervised Multi-View Synchronization Learning for 3D Pose
Estimation [39.334995719523]
Current methods cast monocular 3D human pose estimation as a learning problem by training neural networks on large data sets of images and corresponding skeleton poses.
We propose an approach that can exploit small annotated data sets by fine-tuning networks pre-trained via self-supervised learning on (large) unlabeled data sets.
We demonstrate the effectiveness of the synchronization task on the Human3.6M data set and achieve state-of-the-art results in 3D human pose estimation.
arXiv Detail & Related papers (2020-10-13T08:01:24Z) - Monocular 3D Detection with Geometric Constraints Embedding and
Semi-supervised Training [3.8073142980733]
We propose a novel framework for monocular 3D objects detection using only RGB images, called KM3D-Net.
We design a fully convolutional model to predict object keypoints, dimension, and orientation, and then combine these estimations with perspective geometry constraints to compute position attribute.
arXiv Detail & Related papers (2020-09-02T00:51:51Z) - Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image.
Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space.
We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step.
This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z) - SESS: Self-Ensembling Semi-Supervised 3D Object Detection [138.80825169240302]
We propose SESS, a self-ensembling semi-supervised 3D object detection framework. Specifically, we design a thorough perturbation scheme to enhance generalization of the network on unlabeled and new unseen data.
Our SESS achieves competitive performance compared to the state-of-the-art fully-supervised method by using only 50% labeled data.
arXiv Detail & Related papers (2019-12-26T08:48:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.