TO-Scene: A Large-scale Dataset for Understanding 3D Tabletop Scenes
- URL: http://arxiv.org/abs/2203.09440v2
- Date: Mon, 21 Mar 2022 06:18:32 GMT
- Title: TO-Scene: A Large-scale Dataset for Understanding 3D Tabletop Scenes
- Authors: Mutian Xu, Pei Chen, Haolin Liu, Xiaoguang Han
- Abstract summary: We introduce TO-Scene, a large-scale dataset focusing on tabletop scenes.
To acquire the data, a crowdsourcing UI is developed to transfer CAD objects onto tables from ScanNet.
A tabletop-aware learning strategy is proposed for better perceiving the small-sized tabletop instances.
- Score: 24.422147844863304
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many basic indoor activities such as eating or writing are always conducted
upon different tabletops (e.g., coffee tables, writing desks). It is
indispensable to understanding tabletop scenes in 3D indoor scene parsing
applications. Unfortunately, it is hard to meet this demand by directly
deploying data-driven algorithms, since 3D tabletop scenes are rarely available
in current datasets. To remedy this defect, we introduce TO-Scene, a
large-scale dataset focusing on tabletop scenes, which contains 20,740 scenes
with three variants. To acquire the data, we design an efficient and scalable
framework, where a crowdsourcing UI is developed to transfer CAD objects onto
tables from ScanNet, then the output tabletop scenes are simulated into real
scans and annotated automatically.
Further, a tabletop-aware learning strategy is proposed for better perceiving
the small-sized tabletop instances. Notably, we also provide a real scanned
test set TO-Real to verify the practical value of TO-Scene. Experiments show
that the algorithms trained on TO-Scene indeed work on the realistic test data,
and our proposed tabletop-aware learning strategy greatly improves the
state-of-the-art results on both 3D semantic segmentation and object detection
tasks. TO-Scene and TO-Real, plus Web UI, will all be publicly available.
Related papers
- TabletopGen: Instance-Level Interactive 3D Tabletop Scene Generation from Text or Single Image [22.08471897328577]
TabletopGen is a training-free, fully automatic framework that generates diverse, instance-level interactive 3D tabletop scenes.<n>We show that TabletopGen achieves state-of-the-art performance, surpassing markedly existing methods in visual fidelity, layout accuracy, and physical plausibility.
arXiv Detail & Related papers (2025-12-01T02:38:52Z) - MesaTask: Towards Task-Driven Tabletop Scene Generation via 3D Spatial Reasoning [97.97174328960807]
The ability of robots to execute manipulation tasks requires the availability of task-relevant tabletop scenes for training.<n>Traditional methods for creating these scenes rely on time-consuming manual layout design or purely randomized layouts.<n>We present MesaTask, an LLM-based framework that utilizes this reasoning chain and is further enhanced with DPO algorithms to generate physically plausible tabletop scenes.
arXiv Detail & Related papers (2025-09-26T12:46:00Z) - MetaScenes: Towards Automated Replica Creation for Real-world 3D Scans [76.39726619818896]
Embodied AI (EAI) research requires high-quality, diverse 3D scenes to support skill acquisition, sim-to-real transfer, and generalization.<n>Existing datasets demonstrate that this process heavily relies on artist-driven designs.<n>We present MetaScenes, a large-scale, simulatable 3D scene dataset constructed from real-world scans.
arXiv Detail & Related papers (2025-05-05T06:13:25Z) - SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining [100.23919762298227]
We introduce SceneSplat, the first large-scale 3D indoor scene understanding approach that operates on 3DGS.
We also propose a self-supervised learning scheme that unlocks rich 3D feature learning from unlabeled scenes.
SceneSplat-7K is the first large-scale 3DGS dataset for indoor scenes, comprising of 6868 scenes.
arXiv Detail & Related papers (2025-03-23T12:50:25Z) - CrossOver: 3D Scene Cross-Modal Alignment [78.3057713547313]
CrossOver is a novel framework for cross-modal 3D scene understanding.
It learns a unified, modality-agnostic embedding space for scenes by aligning modalities.
It supports robust scene retrieval and object localization, even with missing modalities.
arXiv Detail & Related papers (2025-02-20T20:05:30Z) - SliceOcc: Indoor 3D Semantic Occupancy Prediction with Vertical Slice Representation [50.420711084672966]
We present SliceOcc, an RGB camera-based model specifically tailored for indoor 3D semantic occupancy prediction.
Experimental results on the EmbodiedScan dataset demonstrate that SliceOcc achieves a mIoU of 15.45% across 81 indoor categories.
arXiv Detail & Related papers (2025-01-28T03:41:24Z) - Articulate3D: Holistic Understanding of 3D Scenes as Universal Scene Description [56.69740649781989]
3D scene understanding is a long-standing challenge in computer vision and a key component in enabling mixed reality, wearable computing, and embodied AI.<n>We introduce Articulate3D, an expertly curated 3D dataset featuring high-quality manual annotations on 280 indoor scenes.<n>We also present USDNet, a novel unified framework capable of simultaneously predicting part segmentation along with a full specification of motion attributes for articulated objects.
arXiv Detail & Related papers (2024-12-02T11:33:55Z) - OpenSU3D: Open World 3D Scene Understanding using Foundation Models [2.1262749936758216]
We present a novel, scalable approach for constructing open set, instance-level 3D scene representations.
Existing methods require pre-constructed 3D scenes and face scalability issues due to per-point feature vector learning.
We evaluate our proposed approach on multiple scenes from ScanNet and Replica datasets demonstrating zero-shot generalization capabilities.
arXiv Detail & Related papers (2024-07-19T13:01:12Z) - Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers [65.51132104404051]
We introduce the use of object identifiers and object-centric representations to interact with scenes at the object level.
Our model significantly outperforms existing methods on benchmarks including ScanRefer, Multi3DRefer, Scan2Cap, ScanQA, and SQA3D.
arXiv Detail & Related papers (2023-12-13T14:27:45Z) - Model2Scene: Learning 3D Scene Representation via Contrastive
Language-CAD Models Pre-training [105.3421541518582]
Current successful methods of 3D scene perception rely on the large-scale annotated point cloud.
We propose Model2Scene, a novel paradigm that learns free 3D scene representation from Computer-Aided Design (CAD) models and languages.
Model2Scene yields impressive label-free 3D object salient detection with an average mAP of 46.08% and 55.49% on the ScanNet and S3DIS datasets, respectively.
arXiv Detail & Related papers (2023-09-29T03:51:26Z) - CommonScenes: Generating Commonsense 3D Indoor Scenes with Scene Graph
Diffusion [83.30168660888913]
We present CommonScenes, a fully generative model that converts scene graphs into corresponding controllable 3D scenes.
Our pipeline consists of two branches, one predicting the overall scene layout via a variational auto-encoder and the other generating compatible shapes.
The generated scenes can be manipulated by editing the input scene graph and sampling the noise in the diffusion model.
arXiv Detail & Related papers (2023-05-25T17:39:13Z) - SGAligner : 3D Scene Alignment with Scene Graphs [84.01002998166145]
Building 3D scene graphs has emerged as a topic in scene representation for several embodied AI applications.
We focus on the fundamental problem of aligning pairs of 3D scene graphs whose overlap can range from zero to partial.
We propose SGAligner, the first method for aligning pairs of 3D scene graphs that is robust to in-the-wild scenarios.
arXiv Detail & Related papers (2023-04-28T14:39:22Z) - RandomRooms: Unsupervised Pre-training from Synthetic Shapes and
Randomized Layouts for 3D Object Detection [138.2892824662943]
A promising solution is to make better use of the synthetic dataset, which consists of CAD object models, to boost the learning on real datasets.
Recent work on 3D pre-training exhibits failure when transfer features learned on synthetic objects to other real-world applications.
In this work, we put forward a new method called RandomRooms to accomplish this objective.
arXiv Detail & Related papers (2021-08-17T17:56:12Z) - 3D Sketch-aware Semantic Scene Completion via Semi-supervised Structure
Prior [50.73148041205675]
The goal of the Semantic Scene Completion (SSC) task is to simultaneously predict a completed 3D voxel representation of volumetric occupancy and semantic labels of objects in the scene from a single-view observation.
We propose to devise a new geometry-based strategy to embed depth information with low-resolution voxel representation.
Our proposed geometric embedding works better than the depth feature learning from habitual SSC frameworks.
arXiv Detail & Related papers (2020-03-31T09:33:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.