360-MLC: Multi-view Layout Consistency for Self-training and
Hyper-parameter Tuning
- URL: http://arxiv.org/abs/2210.12935v1
- Date: Mon, 24 Oct 2022 03:31:48 GMT
- Title: 360-MLC: Multi-view Layout Consistency for Self-training and
Hyper-parameter Tuning
- Authors: Bolivar Solarte, Chin-Hsuan Wu, Yueh-Cheng Liu, Yi-Hsuan Tsai, Min Sun
- Abstract summary: We present 360-MLC, a self-training method based on multi-view layout consistency for finetuning monocular room- models.
We leverage the entropy information in multiple layout estimations as a quantitative metric to measure the geometry consistency of the scene.
- Score: 40.93848397359068
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present 360-MLC, a self-training method based on multi-view layout
consistency for finetuning monocular room-layout models using unlabeled
360-images only. This can be valuable in practical scenarios where a
pre-trained model needs to be adapted to a new data domain without using any
ground truth annotations. Our simple yet effective assumption is that multiple
layout estimations in the same scene must define a consistent geometry
regardless of their camera positions. Based on this idea, we leverage a
pre-trained model to project estimated layout boundaries from several camera
views into the 3D world coordinate. Then, we re-project them back to the
spherical coordinate and build a probability function, from which we sample the
pseudo-labels for self-training. To handle unconfident pseudo-labels, we
evaluate the variance in the re-projected boundaries as an uncertainty value to
weight each pseudo-label in our loss function during training. In addition,
since ground truth annotations are not available during training nor in
testing, we leverage the entropy information in multiple layout estimations as
a quantitative metric to measure the geometry consistency of the scene,
allowing us to evaluate any layout estimator for hyper-parameter tuning,
including model selection without ground truth annotations. Experimental
results show that our solution achieves favorable performance against
state-of-the-art methods when self-training from three publicly available
source datasets to a unique, newly labeled dataset consisting of multi-view of
the same scenes.
Related papers
- Self-training Room Layout Estimation via Geometry-aware Ray-casting [27.906107629563852]
We introduce a geometry-aware self-training framework for room layout estimation models on unseen scenes with unlabeled data.
Our approach utilizes a ray-casting formulation to aggregate multiple estimates from different viewing positions.
arXiv Detail & Related papers (2024-07-21T03:25:55Z) - 360 Layout Estimation via Orthogonal Planes Disentanglement and Multi-view Geometric Consistency Perception [56.84921040837699]
Existing panoramic layout estimation solutions tend to recover room boundaries from a vertically compressed sequence, yielding imprecise results.
We propose an orthogonal plane disentanglement network (termed DOPNet) to distinguish ambiguous semantics.
We also present an unsupervised adaptation technique tailored for horizon-depth and ratio representations.
Our solution outperforms other SoTA models on both monocular layout estimation and multi-view layout estimation tasks.
arXiv Detail & Related papers (2023-12-26T12:16:03Z) - Not Every Side Is Equal: Localization Uncertainty Estimation for
Semi-Supervised 3D Object Detection [38.77989138502667]
Semi-supervised 3D object detection from point cloud aims to train a detector with a small number of labeled data and a large number of unlabeled data.
Existing methods treat each pseudo bounding box as a whole and assign equal importance to each side during training.
We propose a side-aware framework for semi-supervised 3D object detection consisting of three key designs.
arXiv Detail & Related papers (2023-12-16T09:08:03Z) - FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects [55.77542145604758]
FoundationPose is a unified foundation model for 6D object pose estimation and tracking.
Our approach can be instantly applied at test-time to a novel object without fine-tuning.
arXiv Detail & Related papers (2023-12-13T18:28:09Z) - MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based
Self-Supervised Pre-Training [58.07391711548269]
Masked Voxel Jigsaw and Reconstruction (MV-JAR) method for LiDAR-based self-supervised pre-training.
Masked Voxel Jigsaw and Reconstruction (MV-JAR) method for LiDAR-based self-supervised pre-training.
arXiv Detail & Related papers (2023-03-23T17:59:02Z) - CPPF++: Uncertainty-Aware Sim2Real Object Pose Estimation by Vote Aggregation [67.12857074801731]
We introduce a novel method, CPPF++, designed for sim-to-real pose estimation.
To address the challenge posed by vote collision, we propose a novel approach that involves modeling the voting uncertainty.
We incorporate several innovative modules, including noisy pair filtering, online alignment optimization, and a feature ensemble.
arXiv Detail & Related papers (2022-11-24T03:27:00Z) - Semantic keypoint-based pose estimation from single RGB frames [64.80395521735463]
We present an approach to estimating the continuous 6-DoF pose of an object from a single RGB image.
The approach combines semantic keypoints predicted by a convolutional network (convnet) with a deformable shape model.
We show that our approach can accurately recover the 6-DoF object pose for both instance- and class-based scenarios.
arXiv Detail & Related papers (2022-04-12T15:03:51Z) - Self-supervised 360$^{\circ}$ Room Layout Estimation [20.062713286961326]
We present the first self-supervised method to train panoramic room layout estimation models without any labeled data.
Our approach also shows promising solutions in data-scarce scenarios and active learning, which would have an immediate value in real estate virtual tour software.
arXiv Detail & Related papers (2022-03-30T04:58:07Z) - Towards General Purpose Geometry-Preserving Single-View Depth Estimation [1.9573380763700712]
Single-view depth estimation (SVDE) plays a crucial role in scene understanding for AR applications, 3D modeling, and robotics.
Recent works have shown that a successful solution strongly relies on the diversity and volume of training data.
Our work shows that a model trained on this data along with conventional datasets can gain accuracy while predicting correct scene geometry.
arXiv Detail & Related papers (2020-09-25T20:06:13Z) - Monocular 3D Detection with Geometric Constraints Embedding and
Semi-supervised Training [3.8073142980733]
We propose a novel framework for monocular 3D objects detection using only RGB images, called KM3D-Net.
We design a fully convolutional model to predict object keypoints, dimension, and orientation, and then combine these estimations with perspective geometry constraints to compute position attribute.
arXiv Detail & Related papers (2020-09-02T00:51:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.