MVRackLay: Monocular Multi-View Layout Estimation for Warehouse Racks
and Shelves
- URL: http://arxiv.org/abs/2211.16882v1
- Date: Wed, 30 Nov 2022 10:32:04 GMT
- Title: MVRackLay: Monocular Multi-View Layout Estimation for Warehouse Racks
and Shelves
- Authors: Pranjali Pathre, Anurag Sahu, Ashwin Rao, Avinash Prabhu, Meher
Shashwat Nigam, Tanvi Karandikar, Harit Pandya, and K. Madhava Krishna
- Abstract summary: MVRackLay estimates multi-layered layouts, wherein each layer corresponds to the layout of a shelf within a rack.
With minimal effort, such an output is transformed into a 3D rendering of all racks, shelves and objects on the shelves.
MVRackLay shows superior performance vis-a-vis its single view counterpart, RackLay, in layout accuracy, quantized in terms of the mean IoU and mAP metrics.
- Score: 8.845291721126825
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we propose and showcase, for the first time, monocular
multi-view layout estimation for warehouse racks and shelves. Unlike typical
layout estimation methods, MVRackLay estimates multi-layered layouts, wherein
each layer corresponds to the layout of a shelf within a rack. Given a sequence
of images of a warehouse scene, a dual-headed Convolutional-LSTM architecture
outputs segmented racks, the front and the top view layout of each shelf within
a rack. With minimal effort, such an output is transformed into a 3D rendering
of all racks, shelves and objects on the shelves, giving an accurate 3D
depiction of the entire warehouse scene in terms of racks, shelves and the
number of objects on each shelf. MVRackLay generalizes to a diverse set of
warehouse scenes with varying number of objects on each shelf, number of
shelves and in the presence of other such racks in the background. Further,
MVRackLay shows superior performance vis-a-vis its single view counterpart,
RackLay, in layout accuracy, quantized in terms of the mean IoU and mAP
metrics. We also showcase a multi-view stitching of the 3D layouts resulting in
a representation of the warehouse scene with respect to a global reference
frame akin to a rendering of the scene from a SLAM pipeline. To the best of our
knowledge, this is the first such work to portray a 3D rendering of a warehouse
scene in terms of its semantic components - Racks, Shelves and Objects - all
from a single monocular camera.
Related papers
- 3DFIRES: Few Image 3D REconstruction for Scenes with Hidden Surface [8.824340350342512]
3DFIRES is a novel system for scene-level 3D reconstruction from posed images.
We show it matches the efficacy of single-view reconstruction methods with only one input.
arXiv Detail & Related papers (2024-03-13T17:59:50Z) - Shelving, Stacking, Hanging: Relational Pose Diffusion for Multi-modal
Rearrangement [49.888011242939385]
We propose a system for rearranging objects in a scene to achieve a desired object-scene placing relationship.
The pipeline generalizes to novel geometries, poses, and layouts of both scenes and objects.
arXiv Detail & Related papers (2023-07-10T17:56:06Z) - SGAligner : 3D Scene Alignment with Scene Graphs [84.01002998166145]
Building 3D scene graphs has emerged as a topic in scene representation for several embodied AI applications.
We focus on the fundamental problem of aligning pairs of 3D scene graphs whose overlap can range from zero to partial.
We propose SGAligner, the first method for aligning pairs of 3D scene graphs that is robust to in-the-wild scenarios.
arXiv Detail & Related papers (2023-04-28T14:39:22Z) - Anything-3D: Towards Single-view Anything Reconstruction in the Wild [61.090129285205805]
We introduce Anything-3D, a methodical framework that ingeniously combines a series of visual-language models and the Segment-Anything object segmentation model.
Our approach employs a BLIP model to generate textural descriptions, utilize the Segment-Anything model for the effective extraction of objects of interest, and leverages a text-to-image diffusion model to lift object into a neural radiance field.
arXiv Detail & Related papers (2023-04-19T16:39:51Z) - Multiview Compressive Coding for 3D Reconstruction [77.95706553743626]
We introduce a simple framework that operates on 3D points of single objects or whole scenes.
Our model, Multiview Compressive Coding, learns to compress the input appearance and geometry to predict the 3D structure.
arXiv Detail & Related papers (2023-01-19T18:59:52Z) - CMR3D: Contextualized Multi-Stage Refinement for 3D Object Detection [57.44434974289945]
We propose Contextualized Multi-Stage Refinement for 3D Object Detection (CMR3D) framework.
Our framework takes a 3D scene as input and strives to explicitly integrate useful contextual information of the scene.
In addition to 3D object detection, we investigate the effectiveness of our framework for the problem of 3D object counting.
arXiv Detail & Related papers (2022-09-13T05:26:09Z) - Monocular Spherical Depth Estimation with Explicitly Connected Weak
Layout Cues [27.15511982413305]
We generate a geometric vision (360V) dataset that includes multiple modalities, multi-view stereo data and automatically generated weak layout cues.
We rely on depth-based layout reconstruction and layout-based depth attention, demonstrating increased performance across both tasks.
By using single 360 cameras to scan rooms, the opportunity for facile and quick building-scale 3D scanning arises.
arXiv Detail & Related papers (2022-06-22T20:10:45Z) - 3D Instance Segmentation of MVS Buildings [5.2517244720510305]
We present a novel framework for instance segmentation of 3D buildings from Multi-view Stereo (MVS) urban scenes.
The emphasis of this work lies in detecting and segmenting 3D building instances even if they are attached and embedded in a large and imprecise 3D surface model.
arXiv Detail & Related papers (2021-12-18T11:12:38Z) - MVLayoutNet:3D layout reconstruction with multi-view panoramas [12.981269280023469]
MVNet is an end-to-end network for holistic 3D reconstruction from multi-view panoramas.
We jointly train a layout module to produce an initial layout and a novel MVS module to obtain accurate layout geometry.
Our method leads to coherent layout geometry that enables the reconstruction of an entire scene.
arXiv Detail & Related papers (2021-12-12T03:04:32Z) - RackLay: Multi-Layer Layout Estimation for Warehouse Racks [17.937062635570268]
We present RackLay, a deep neural network for real-time shelf layout estimation from a single image.
RackLay estimates the top-view and front-view layout for each shelf in the considered rack populated with objects.
We also show that fusing the top-view and front-view enables 3D reasoning applications such as metric free space estimation for the considered rack.
arXiv Detail & Related papers (2021-03-16T16:22:31Z) - Shelf-Supervised Mesh Prediction in the Wild [54.01373263260449]
We propose a learning-based approach to infer 3D shape and pose of object from a single image.
We first infer a volumetric representation in a canonical frame, along with the camera pose.
The coarse volumetric prediction is then converted to a mesh-based representation, which is further refined in the predicted camera frame.
arXiv Detail & Related papers (2021-02-11T18:57:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.