Related papers: A Generalist Framework for Panoptic Segmentation of Images and Videos

A Generalist Framework for Panoptic Segmentation of Images and Videos

URL: http://arxiv.org/abs/2210.06366v4
Date: Thu, 12 Oct 2023 22:25:43 GMT
Title: A Generalist Framework for Panoptic Segmentation of Images and Videos
Authors: Ting Chen, Lala Li, Saurabh Saxena, Geoffrey Hinton, David J. Fleet
Abstract summary: We formulate panoptic segmentation as a discrete data generation problem, without relying on inductive bias of the task. A diffusion model is proposed to model panoptic masks, with a simple architecture and generic loss function. Our method is capable of modeling video (in a streaming setting) and thereby learns to track object instances automatically.
Score: 61.61453194912186
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Panoptic segmentation assigns semantic and instance ID labels to every pixel of an image. As permutations of instance IDs are also valid solutions, the task requires learning of high-dimensional one-to-many mapping. As a result, state-of-the-art approaches use customized architectures and task-specific loss functions. We formulate panoptic segmentation as a discrete data generation problem, without relying on inductive bias of the task. A diffusion model is proposed to model panoptic masks, with a simple architecture and generic loss function. By simply adding past predictions as a conditioning signal, our method is capable of modeling video (in a streaming setting) and thereby learns to track object instances automatically. With extensive experiments, we demonstrate that our simple approach can perform competitively to state-of-the-art specialist methods in similar settings.

Related papers

UnSeg: One Universal Unlearnable Example Generator is Enough against All Image Segmentation [64.01742988773745]
An increasing privacy concern exists regarding training large-scale image segmentation models on unauthorized private data. We exploit the concept of unlearnable examples to make images unusable to model training by generating and adding unlearnable noise into the original images. We empirically verify the effectiveness of UnSeg across 6 mainstream image segmentation tasks, 10 widely used datasets, and 7 different network architectures.
arXiv Detail & Related papers (2024-10-13T16:34:46Z)
Correlation Weighted Prototype-based Self-Supervised One-Shot Segmentation of Medical Images [12.365801596593936]
Medical image segmentation is one of the domains where sufficient annotated data is not available. We propose a prototype-based self-supervised one-way one-shot learning framework using pseudo-labels generated from superpixels. We show that the proposed simple but potent framework performs at par with the state-of-the-art methods.
arXiv Detail & Related papers (2024-08-12T15:38:51Z)
Depth-aware Panoptic Segmentation [1.4170154234094008]
We present a novel CNN-based method for panoptic segmentation. We propose a new depth-aware dice loss term which penalises the assignment of pixels to the same thing instance. Experiments carried out on the Cityscapes dataset show that the proposed method reduces the number of objects that are erroneously merged into one thing instance.
arXiv Detail & Related papers (2024-03-21T08:06:49Z)
Self-Supervised Instance Segmentation by Grasping [84.2469669256257]
We learn a grasp segmentation model to segment the grasped object from before and after grasp images. Using the segmented objects, we can "cut" objects from their original scenes and "paste" them into new scenes to generate instance supervision. We show that our grasp segmentation model provides a 5x error reduction when segmenting grasped objects compared with traditional image subtraction approaches.
arXiv Detail & Related papers (2023-05-10T16:51:36Z)
Foreground-Background Separation through Concept Distillation from Generative Image Foundation Models [6.408114351192012]
We present a novel method that enables the generation of general foreground-background segmentation models from simple textual descriptions. We show results on the task of segmenting four different objects (humans, dogs, cars, birds) and a use case scenario in medical image analysis.
arXiv Detail & Related papers (2022-12-29T13:51:54Z)
Structure-Guided Image Completion with Image-level and Object-level Semantic Discriminators [97.12135238534628]
We propose a learning paradigm that consists of semantic discriminators and object-level discriminators for improving the generation of complex semantics and objects. Specifically, the semantic discriminators leverage pretrained visual features to improve the realism of the generated visual concepts. Our proposed scheme significantly improves the generation quality and achieves state-of-the-art results on various tasks.
arXiv Detail & Related papers (2022-12-13T01:36:56Z)
Decoupled Multi-task Learning with Cyclical Self-Regulation for Face Parsing [71.19528222206088]
We propose a novel Decoupled Multi-task Learning with Cyclical Self-Regulation for face parsing. Specifically, DML-CSR designs a multi-task model which comprises face parsing, binary edge, and category edge detection. Our method achieves the new state-of-the-art performance on the Helen, CelebA-HQ, and LapaMask datasets.
arXiv Detail & Related papers (2022-03-28T02:12:30Z)
Hierarchical Lov\'asz Embeddings for Proposal-free Panoptic Segmentation [25.065380488503262]
State-of-the-art panoptic segmentation methods use complex models with a distinct stream for each task. We propose Hierarchical Lov'asz Embeddings, per pixel feature vectors that simultaneously encode instance- and category-level discriminative information. Our model achieves state-of-the-art results compared to existing proposal-free panoptic segmentation methods on Cityscapes, COCO, and Mapillary Vistas.
arXiv Detail & Related papers (2021-06-08T17:43:54Z)
BoundarySqueeze: Image Segmentation as Boundary Squeezing [104.43159799559464]
We propose a novel method for fine-grained high-quality image segmentation of both objects and scenes. Inspired by dilation and erosion from morphological image processing techniques, we treat the pixel level segmentation problems as squeezing object boundary. Our method yields large gains on COCO, Cityscapes, for both instance and semantic segmentation and outperforms previous state-of-the-art PointRend in both accuracy and speed under the same setting.
arXiv Detail & Related papers (2021-05-25T04:58:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.