A Simple and Generalist Approach for Panoptic Segmentation
- URL: http://arxiv.org/abs/2408.16504v1
- Date: Thu, 29 Aug 2024 13:02:12 GMT
- Title: A Simple and Generalist Approach for Panoptic Segmentation
- Authors: Nedyalko Prisadnikov, Wouter Van Gansbeke, Danda Pani Paudel, Luc Van Gool,
- Abstract summary: Generalist vision models aim for one and the same architecture for a variety of vision tasks.
While such shared architecture may seem attractive, generalist models tend to be outperformed by their bespoken counterparts.
We address this problem by introducing two key contributions, without compromising the desirable properties of generalist models.
- Score: 57.94892855772925
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generalist vision models aim for one and the same architecture for a variety of vision tasks. While such shared architecture may seem attractive, generalist models tend to be outperformed by their bespoken counterparts, especially in the case of panoptic segmentation. We address this problem by introducing two key contributions, without compromising the desirable properties of generalist models. These contributions are: (i) a positional-embedding (PE) based loss for improved centroid regressions; (ii) Edge Distance Sampling (EDS) for the better separation of instance boundaries. The PE-based loss facilitates a better per-pixel regression of the associated instance's centroid, whereas EDS contributes by carefully handling the void regions (caused by missing labels) and smaller instances. These two simple yet effective modifications significantly improve established baselines, while achieving state-of-the-art results among all generalist solutions. More specifically, our method achieves a panoptic quality(PQ) of 52.5 on the COCO dataset, which is an improvement of 10 points over the best model with similar approach (Painter), and is superior by 2 to the best performing diffusion-based method Pix2Seq-$\mathcal{D}$. Furthermore, we provide insights into and an in-depth analysis of our contributions through exhaustive experiments. Our source code and model weights will be made publicly available.
Related papers
- Towards Self-Supervised FG-SBIR with Unified Sample Feature Alignment and Multi-Scale Token Recycling [11.129453244307369]
FG-SBIR aims to minimize the distance between sketches and corresponding images in the embedding space.
We propose an effective approach to narrow the gap between the two domains.
It mainly facilitates unified mutual information sharing both intra- and inter-samples.
arXiv Detail & Related papers (2024-06-17T13:49:12Z) - Panoptic Out-of-Distribution Segmentation [11.388678390784195]
We propose Panoptic Out-of Distribution for joint pixel-level semantic in-distribution and out-of-distribution classification with instance prediction.
We make the dataset, code, and trained models publicly available at http://pods.cs.uni-freiburg.de.
arXiv Detail & Related papers (2023-10-18T08:38:31Z) - SwIPE: Efficient and Robust Medical Image Segmentation with Implicit Patch Embeddings [12.79344668998054]
We propose SwIPE (Segmentation with Implicit Patch Embeddings) to enable accurate local boundary delineation and global shape coherence.
We show that SwIPE significantly improves over recent implicit approaches and outperforms state-of-the-art discrete methods with over 10x fewer parameters.
arXiv Detail & Related papers (2023-07-23T20:55:11Z) - Studying How to Efficiently and Effectively Guide Models with Explanations [52.498055901649025]
'Model guidance' is the idea of regularizing the models' explanations to ensure that they are "right for the right reasons"
We conduct an in-depth evaluation across various loss functions, attribution methods, models, and 'guidance depths' on the PASCAL VOC 2007 and MS COCO 2014 datasets.
Specifically, we guide the models via bounding box annotations, which are much cheaper to obtain than the commonly used segmentation masks.
arXiv Detail & Related papers (2023-03-21T15:34:50Z) - Interactive Segmentation as Gaussian Process Classification [58.44673380545409]
Click-based interactive segmentation (IS) aims to extract the target objects under user interaction.
Most of the current deep learning (DL)-based methods mainly follow the general pipelines of semantic segmentation.
We propose to formulate the IS task as a Gaussian process (GP)-based pixel-wise binary classification model on each image.
arXiv Detail & Related papers (2023-02-28T14:01:01Z) - Rethinking Semi-Supervised Medical Image Segmentation: A
Variance-Reduction Perspective [51.70661197256033]
We propose ARCO, a semi-supervised contrastive learning framework with stratified group theory for medical image segmentation.
We first propose building ARCO through the concept of variance-reduced estimation and show that certain variance-reduction techniques are particularly beneficial in pixel/voxel-level segmentation tasks.
We experimentally validate our approaches on eight benchmarks, i.e., five 2D/3D medical and three semantic segmentation datasets, with different label settings.
arXiv Detail & Related papers (2023-02-03T13:50:25Z) - Improving Generalization in Federated Learning by Seeking Flat Minima [23.937135834522145]
Models trained in federated settings often suffer from degraded performances and fail at generalizing.
In this work, we investigate such behavior through the lens of geometry of the loss and Hessian eigenspectrum.
Motivated by prior studies connecting the sharpness of the loss surface and the generalization gap, we show that i) training clients locally with Sharpness-Aware Minimization (SAM) or its adaptive version (ASAM) on the server-side can substantially improve generalization.
arXiv Detail & Related papers (2022-03-22T16:01:04Z) - DONet: Dual Objective Networks for Skin Lesion Segmentation [77.9806410198298]
We propose a simple yet effective framework, named Dual Objective Networks (DONet), to improve the skin lesion segmentation.
Our DONet adopts two symmetric decoders to produce different predictions for approaching different objectives.
To address the challenge of large variety of lesion scales and shapes in dermoscopic images, we additionally propose a recurrent context encoding module (RCEM)
arXiv Detail & Related papers (2020-08-19T06:02:46Z) - Prior Guided Feature Enrichment Network for Few-Shot Segmentation [64.91560451900125]
State-of-the-art semantic segmentation methods require sufficient labeled data to achieve good results.
Few-shot segmentation is proposed to tackle this problem by learning a model that quickly adapts to new classes with a few labeled support samples.
Theses frameworks still face the challenge of generalization ability reduction on unseen classes due to inappropriate use of high-level semantic information.
arXiv Detail & Related papers (2020-08-04T10:41:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.