ClevrTex: A Texture-Rich Benchmark for Unsupervised Multi-Object
Segmentation
- URL: http://arxiv.org/abs/2111.10265v1
- Date: Fri, 19 Nov 2021 15:11:25 GMT
- Title: ClevrTex: A Texture-Rich Benchmark for Unsupervised Multi-Object
Segmentation
- Authors: Laurynas Karazija, Iro Laina, Christian Rupprecht
- Abstract summary: We present ClevrTex, designed as the next challenge to compare, evaluate and analyze algorithms.
ClarTex features synthetic scenes with diverse shapes, textures and photo-mapped materials, created using physically based rendering techniques.
We benchmark a large set of recent unsupervised multi-object segmentation models on ClevrTex and find all state-of-the-art approaches fail to learn good representations in the textured setting.
- Score: 23.767094632640763
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There has been a recent surge in methods that aim to decompose and segment
scenes into multiple objects in an unsupervised manner, i.e., unsupervised
multi-object segmentation. Performing such a task is a long-standing goal of
computer vision, offering to unlock object-level reasoning without requiring
dense annotations to train segmentation models. Despite significant progress,
current models are developed and trained on visually simple scenes depicting
mono-colored objects on plain backgrounds. The natural world, however, is
visually complex with confounding aspects such as diverse textures and
complicated lighting effects. In this study, we present a new benchmark called
ClevrTex, designed as the next challenge to compare, evaluate and analyze
algorithms. ClevrTex features synthetic scenes with diverse shapes, textures
and photo-mapped materials, created using physically based rendering
techniques. It includes 50k examples depicting 3-10 objects arranged on a
background, created using a catalog of 60 materials, and a further test set
featuring 10k images created using 25 different materials. We benchmark a large
set of recent unsupervised multi-object segmentation models on ClevrTex and
find all state-of-the-art approaches fail to learn good representations in the
textured setting, despite impressive performance on simpler data. We also
create variants of the ClevrTex dataset, controlling for different aspects of
scene complexity, and probe current approaches for individual shortcomings.
Dataset and code are available at
https://www.robots.ox.ac.uk/~vgg/research/clevrtex.
Related papers
- MegaScenes: Scene-Level View Synthesis at Scale [69.21293001231993]
Scene-level novel view synthesis (NVS) is fundamental to many vision and graphics applications.
We create a large-scale scene-level dataset from Internet photo collections, called MegaScenes, which contains over 100K structure from motion (SfM) reconstructions from around the world.
We analyze failure cases of state-of-the-art NVS methods and significantly improve generation consistency.
arXiv Detail & Related papers (2024-06-17T17:55:55Z) - View-Consistent Hierarchical 3D Segmentation Using Ultrametric Feature Fields [52.08335264414515]
We learn a novel feature field within a Neural Radiance Field (NeRF) representing a 3D scene.
Our method takes view-inconsistent multi-granularity 2D segmentations as input and produces a hierarchy of 3D-consistent segmentations as output.
We evaluate our method and several baselines on synthetic datasets with multi-view images and multi-granular segmentation, showcasing improved accuracy and viewpoint-consistency.
arXiv Detail & Related papers (2024-05-30T04:14:58Z) - Total-Decom: Decomposed 3D Scene Reconstruction with Minimal Interaction [51.3632308129838]
We present Total-Decom, a novel method for decomposed 3D reconstruction with minimal human interaction.
Our approach seamlessly integrates the Segment Anything Model (SAM) with hybrid implicit-explicit neural surface representations and a mesh-based region-growing technique for accurate 3D object decomposition.
We extensively evaluate our method on benchmark datasets and demonstrate its potential for downstream applications, such as animation and scene editing.
arXiv Detail & Related papers (2024-03-28T11:12:33Z) - DNA-Rendering: A Diverse Neural Actor Repository for High-Fidelity
Human-centric Rendering [126.00165445599764]
We present DNA-Rendering, a large-scale, high-fidelity repository of human performance data for neural actor rendering.
Our dataset contains over 1500 human subjects, 5000 motion sequences, and 67.5M frames' data volume.
We construct a professional multi-view system to capture data, which contains 60 synchronous cameras with max 4096 x 3000 resolution, 15 fps speed, and stern camera calibration steps.
arXiv Detail & Related papers (2023-07-19T17:58:03Z) - A Comprehensive Review of Modern Object Segmentation Approaches [1.7041248235270654]
Image segmentation is the task of associating pixels in an image with their respective object class labels.
Deep learning-based approaches have been developed for image-level object recognition and pixel-level scene understanding.
Extensions of image segmentation tasks include 3D and video segmentation, where units of vox point clouds, and video frames are classified into different objects.
arXiv Detail & Related papers (2023-01-13T19:35:46Z) - DisCoScene: Spatially Disentangled Generative Radiance Fields for
Controllable 3D-aware Scene Synthesis [90.32352050266104]
DisCoScene is a 3Daware generative model for high-quality and controllable scene synthesis.
It disentangles the whole scene into object-centric generative fields by learning on only 2D images with the global-local discrimination.
We demonstrate state-of-the-art performance on many scene datasets, including the challenging outdoor dataset.
arXiv Detail & Related papers (2022-12-22T18:59:59Z) - Breaking the "Object" in Video Object Segmentation [36.20167854011788]
We present a dataset for Video Object under Transformations (VOST)
It consists of more than 700 high-resolution videos, captured in diverse environments, which are 21 seconds long average and densely labeled with masks instance.
A careful, multi-step approach is adopted to ensure that these videos focus on complex object transformations, capturing their full temporal extent.
We show that existing methods struggle when applied to this novel task and that their main limitation lies in over-reliance on static appearance cues.
arXiv Detail & Related papers (2022-12-12T19:22:17Z) - Discovering Objects that Can Move [55.743225595012966]
We study the problem of object discovery -- separating objects from the background without manual labels.
Existing approaches utilize appearance cues, such as color, texture, and location, to group pixels into object-like regions.
We choose to focus on dynamic objects -- entities that can move independently in the world.
arXiv Detail & Related papers (2022-03-18T21:13:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.