360-MLC: Multi-view Layout Consistency for Self-training and
  Hyper-parameter Tuning
        - URL: http://arxiv.org/abs/2210.12935v1
- Date: Mon, 24 Oct 2022 03:31:48 GMT
- Title: 360-MLC: Multi-view Layout Consistency for Self-training and
  Hyper-parameter Tuning
- Authors: Bolivar Solarte, Chin-Hsuan Wu, Yueh-Cheng Liu, Yi-Hsuan Tsai, Min Sun
- Abstract summary: We present 360-MLC, a self-training method based on multi-view layout consistency for finetuning monocular room- models.
We leverage the entropy information in multiple layout estimations as a quantitative metric to measure the geometry consistency of the scene.
- Score: 40.93848397359068
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   We present 360-MLC, a self-training method based on multi-view layout
consistency for finetuning monocular room-layout models using unlabeled
360-images only. This can be valuable in practical scenarios where a
pre-trained model needs to be adapted to a new data domain without using any
ground truth annotations. Our simple yet effective assumption is that multiple
layout estimations in the same scene must define a consistent geometry
regardless of their camera positions. Based on this idea, we leverage a
pre-trained model to project estimated layout boundaries from several camera
views into the 3D world coordinate. Then, we re-project them back to the
spherical coordinate and build a probability function, from which we sample the
pseudo-labels for self-training. To handle unconfident pseudo-labels, we
evaluate the variance in the re-projected boundaries as an uncertainty value to
weight each pseudo-label in our loss function during training. In addition,
since ground truth annotations are not available during training nor in
testing, we leverage the entropy information in multiple layout estimations as
a quantitative metric to measure the geometry consistency of the scene,
allowing us to evaluate any layout estimator for hyper-parameter tuning,
including model selection without ground truth annotations. Experimental
results show that our solution achieves favorable performance against
state-of-the-art methods when self-training from three publicly available
source datasets to a unique, newly labeled dataset consisting of multi-view of
the same scenes.
 
      
        Related papers
        - Zero-shot Inexact CAD Model Alignment from a Single Image [53.37898107159792]
 A practical approach to infer 3D scene structure from a single image is to retrieve a closely matching 3D model from a database and align it with the object in the image.<n>Existing methods rely on supervised training with images and pose annotations, which limits them to a narrow set of object categories.<n>We propose a weakly supervised 9-DoF alignment method for inexact 3D models that requires no pose annotations and generalizes to unseen categories.
 arXiv  Detail & Related papers  (2025-07-04T04:46:59Z)
- Masked Scene Modeling: Narrowing the Gap Between Supervised and   Self-Supervised Learning in 3D Scene Understanding [5.035452169519211]
 This paper proposes a robust evaluation protocol to assess the quality of self-supervised features for 3D scene understanding.
We introduce the first self-supervised model that performs similarly to supervised models when only off-the-shelf features are used in a linear probing setup.
Our experiments not only demonstrate that our method achieves competitive performance to supervised models, but also surpasses existing self-supervised approaches by a large margin.
 arXiv  Detail & Related papers  (2025-04-09T09:19:49Z)
- Self-training Room Layout Estimation via Geometry-aware Ray-casting [27.906107629563852]
 We introduce a geometry-aware self-training framework for room layout estimation models on unseen scenes with unlabeled data.
Our approach utilizes a ray-casting formulation to aggregate multiple estimates from different viewing positions.
 arXiv  Detail & Related papers  (2024-07-21T03:25:55Z)
- 360 Layout Estimation via Orthogonal Planes Disentanglement and   Multi-view Geometric Consistency Perception [56.84921040837699]
 Existing panoramic layout estimation solutions tend to recover room boundaries from a vertically compressed sequence, yielding imprecise results.
We propose an orthogonal plane disentanglement network (termed DOPNet) to distinguish ambiguous semantics.
We also present an unsupervised adaptation technique tailored for horizon-depth and ratio representations.
Our solution outperforms other SoTA models on both monocular layout estimation and multi-view layout estimation tasks.
 arXiv  Detail & Related papers  (2023-12-26T12:16:03Z)
- Not Every Side Is Equal: Localization Uncertainty Estimation for
  Semi-Supervised 3D Object Detection [38.77989138502667]
 Semi-supervised 3D object detection from point cloud aims to train a detector with a small number of labeled data and a large number of unlabeled data.
Existing methods treat each pseudo bounding box as a whole and assign equal importance to each side during training.
We propose a side-aware framework for semi-supervised 3D object detection consisting of three key designs.
 arXiv  Detail & Related papers  (2023-12-16T09:08:03Z)
- FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects [55.77542145604758]
 FoundationPose is a unified foundation model for 6D object pose estimation and tracking.
Our approach can be instantly applied at test-time to a novel object without fine-tuning.
 arXiv  Detail & Related papers  (2023-12-13T18:28:09Z)
- MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based
  Self-Supervised Pre-Training [58.07391711548269]
 Masked Voxel Jigsaw and Reconstruction (MV-JAR) method for LiDAR-based self-supervised pre-training.
Masked Voxel Jigsaw and Reconstruction (MV-JAR) method for LiDAR-based self-supervised pre-training.
 arXiv  Detail & Related papers  (2023-03-23T17:59:02Z)
- CPPF++: Uncertainty-Aware Sim2Real Object Pose Estimation by Vote   Aggregation [67.12857074801731]
 We introduce a novel method, CPPF++, designed for sim-to-real pose estimation.
To address the challenge posed by vote collision, we propose a novel approach that involves modeling the voting uncertainty.
We incorporate several innovative modules, including noisy pair filtering, online alignment optimization, and a feature ensemble.
 arXiv  Detail & Related papers  (2022-11-24T03:27:00Z)
- Semantic keypoint-based pose estimation from single RGB frames [64.80395521735463]
 We present an approach to estimating the continuous 6-DoF pose of an object from a single RGB image.
The approach combines semantic keypoints predicted by a convolutional network (convnet) with a deformable shape model.
We show that our approach can accurately recover the 6-DoF object pose for both instance- and class-based scenarios.
 arXiv  Detail & Related papers  (2022-04-12T15:03:51Z)
- Self-supervised 360$^{\circ}$ Room Layout Estimation [20.062713286961326]
 We present the first self-supervised method to train panoramic room layout estimation models without any labeled data.
Our approach also shows promising solutions in data-scarce scenarios and active learning, which would have an immediate value in real estate virtual tour software.
 arXiv  Detail & Related papers  (2022-03-30T04:58:07Z)
- Towards General Purpose Geometry-Preserving Single-View Depth Estimation [1.9573380763700712]
 Single-view depth estimation (SVDE) plays a crucial role in scene understanding for AR applications, 3D modeling, and robotics.
Recent works have shown that a successful solution strongly relies on the diversity and volume of training data.
Our work shows that a model trained on this data along with conventional datasets can gain accuracy while predicting correct scene geometry.
 arXiv  Detail & Related papers  (2020-09-25T20:06:13Z)
- Monocular 3D Detection with Geometric Constraints Embedding and
  Semi-supervised Training [3.8073142980733]
 We propose a novel framework for monocular 3D objects detection using only RGB images, called KM3D-Net.
We design a fully convolutional model to predict object keypoints, dimension, and orientation, and then combine these estimations with perspective geometry constraints to compute position attribute.
 arXiv  Detail & Related papers  (2020-09-02T00:51:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.