A Generalist Framework for Panoptic Segmentation of Images and Videos
- URL: http://arxiv.org/abs/2210.06366v4
- Date: Thu, 12 Oct 2023 22:25:43 GMT
- Title: A Generalist Framework for Panoptic Segmentation of Images and Videos
- Authors: Ting Chen, Lala Li, Saurabh Saxena, Geoffrey Hinton, David J. Fleet
- Abstract summary: We formulate panoptic segmentation as a discrete data generation problem, without relying on inductive bias of the task.
A diffusion model is proposed to model panoptic masks, with a simple architecture and generic loss function.
Our method is capable of modeling video (in a streaming setting) and thereby learns to track object instances automatically.
- Score: 61.61453194912186
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Panoptic segmentation assigns semantic and instance ID labels to every pixel
of an image. As permutations of instance IDs are also valid solutions, the task
requires learning of high-dimensional one-to-many mapping. As a result,
state-of-the-art approaches use customized architectures and task-specific loss
functions. We formulate panoptic segmentation as a discrete data generation
problem, without relying on inductive bias of the task. A diffusion model is
proposed to model panoptic masks, with a simple architecture and generic loss
function. By simply adding past predictions as a conditioning signal, our
method is capable of modeling video (in a streaming setting) and thereby learns
to track object instances automatically. With extensive experiments, we
demonstrate that our simple approach can perform competitively to
state-of-the-art specialist methods in similar settings.
Related papers
- Depth-aware Panoptic Segmentation [1.4170154234094008]
We present a novel CNN-based method for panoptic segmentation.
We propose a new depth-aware dice loss term which penalises the assignment of pixels to the same thing instance.
Experiments carried out on the Cityscapes dataset show that the proposed method reduces the number of objects that are erroneously merged into one thing instance.
arXiv Detail & Related papers (2024-03-21T08:06:49Z) - Variable Radiance Field for Real-Life Category-Specifc Reconstruction
from Single Image [27.290232027686237]
We present a novel framework that can reconstruct category-specific objects from a single image without known camera parameters.
We parameterize the geometry and appearance of the object using a multi-scale global feature extractor.
We also propose a contrastive learning-based pretraining strategy to improve the feature extractor.
arXiv Detail & Related papers (2023-06-08T12:12:02Z) - Self-Supervised Instance Segmentation by Grasping [84.2469669256257]
We learn a grasp segmentation model to segment the grasped object from before and after grasp images.
Using the segmented objects, we can "cut" objects from their original scenes and "paste" them into new scenes to generate instance supervision.
We show that our grasp segmentation model provides a 5x error reduction when segmenting grasped objects compared with traditional image subtraction approaches.
arXiv Detail & Related papers (2023-05-10T16:51:36Z) - Foreground-Background Separation through Concept Distillation from
Generative Image Foundation Models [6.408114351192012]
We present a novel method that enables the generation of general foreground-background segmentation models from simple textual descriptions.
We show results on the task of segmenting four different objects (humans, dogs, cars, birds) and a use case scenario in medical image analysis.
arXiv Detail & Related papers (2022-12-29T13:51:54Z) - Structure-Guided Image Completion with Image-level and Object-level Semantic Discriminators [97.12135238534628]
We propose a learning paradigm that consists of semantic discriminators and object-level discriminators for improving the generation of complex semantics and objects.
Specifically, the semantic discriminators leverage pretrained visual features to improve the realism of the generated visual concepts.
Our proposed scheme significantly improves the generation quality and achieves state-of-the-art results on various tasks.
arXiv Detail & Related papers (2022-12-13T01:36:56Z) - Decoupled Multi-task Learning with Cyclical Self-Regulation for Face
Parsing [71.19528222206088]
We propose a novel Decoupled Multi-task Learning with Cyclical Self-Regulation for face parsing.
Specifically, DML-CSR designs a multi-task model which comprises face parsing, binary edge, and category edge detection.
Our method achieves the new state-of-the-art performance on the Helen, CelebA-HQ, and LapaMask datasets.
arXiv Detail & Related papers (2022-03-28T02:12:30Z) - Hierarchical Lov\'asz Embeddings for Proposal-free Panoptic Segmentation [25.065380488503262]
State-of-the-art panoptic segmentation methods use complex models with a distinct stream for each task.
We propose Hierarchical Lov'asz Embeddings, per pixel feature vectors that simultaneously encode instance- and category-level discriminative information.
Our model achieves state-of-the-art results compared to existing proposal-free panoptic segmentation methods on Cityscapes, COCO, and Mapillary Vistas.
arXiv Detail & Related papers (2021-06-08T17:43:54Z) - BoundarySqueeze: Image Segmentation as Boundary Squeezing [104.43159799559464]
We propose a novel method for fine-grained high-quality image segmentation of both objects and scenes.
Inspired by dilation and erosion from morphological image processing techniques, we treat the pixel level segmentation problems as squeezing object boundary.
Our method yields large gains on COCO, Cityscapes, for both instance and semantic segmentation and outperforms previous state-of-the-art PointRend in both accuracy and speed under the same setting.
arXiv Detail & Related papers (2021-05-25T04:58:51Z) - Unsupervised Layered Image Decomposition into Object Prototypes [39.20333694585477]
We present an unsupervised learning framework for decomposing images into layers of automatically discovered object models.
We first validate our approach by providing results on par with the state of the art on standard multi-object synthetic benchmarks.
We then demonstrate the applicability of our model to real images in tasks that include clustering (SVHN, GTSRB), cosegmentation (Weizmann Horse) and object discovery from unfiltered social network images.
arXiv Detail & Related papers (2021-04-29T18:02:01Z) - SMILE: Semantically-guided Multi-attribute Image and Layout Editing [154.69452301122175]
Attribute image manipulation has been a very active topic since the introduction of Generative Adversarial Networks (GANs)
We present a multimodal representation that handles all attributes, be it guided by random noise or images, while only using the underlying domain information of the target domain.
Our method is capable of adding, removing or changing either fine-grained or coarse attributes by using an image as a reference or by exploring the style distribution space.
arXiv Detail & Related papers (2020-10-05T20:15:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.