Related papers: A Simple and Generalist Approach for Panoptic Segmentation

A Simple and Generalist Approach for Panoptic Segmentation

URL: http://arxiv.org/abs/2408.16504v1
Date: Thu, 29 Aug 2024 13:02:12 GMT
Title: A Simple and Generalist Approach for Panoptic Segmentation
Authors: Nedyalko Prisadnikov, Wouter Van Gansbeke, Danda Pani Paudel, Luc Van Gool,
Abstract summary: Generalist vision models aim for one and the same architecture for a variety of vision tasks. While such shared architecture may seem attractive, generalist models tend to be outperformed by their bespoken counterparts. We address this problem by introducing two key contributions, without compromising the desirable properties of generalist models.
Score: 57.94892855772925
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Generalist vision models aim for one and the same architecture for a variety of vision tasks. While such shared architecture may seem attractive, generalist models tend to be outperformed by their bespoken counterparts, especially in the case of panoptic segmentation. We address this problem by introducing two key contributions, without compromising the desirable properties of generalist models. These contributions are: (i) a positional-embedding (PE) based loss for improved centroid regressions; (ii) Edge Distance Sampling (EDS) for the better separation of instance boundaries. The PE-based loss facilitates a better per-pixel regression of the associated instance's centroid, whereas EDS contributes by carefully handling the void regions (caused by missing labels) and smaller instances. These two simple yet effective modifications significantly improve established baselines, while achieving state-of-the-art results among all generalist solutions. More specifically, our method achieves a panoptic quality(PQ) of 52.5 on the COCO dataset, which is an improvement of 10 points over the best model with similar approach (Painter), and is superior by 2 to the best performing diffusion-based method Pix2Seq-$\mathcal{D}$. Furthermore, we provide insights into and an in-depth analysis of our contributions through exhaustive experiments. Our source code and model weights will be made publicly available.

Related papers

Towards Self-Supervised FG-SBIR with Unified Sample Feature Alignment and Multi-Scale Token Recycling [11.129453244307369]
FG-SBIR aims to minimize the distance between sketches and corresponding images in the embedding space. We propose an effective approach to narrow the gap between the two domains. It mainly facilitates unified mutual information sharing both intra- and inter-samples.
arXiv Detail & Related papers (2024-06-17T13:49:12Z)
Space-Variant Total Variation boosted by learning techniques in few-view tomographic imaging [0.0]
This paper focuses on the development of a space-variant regularization model for solving an under-determined linear inverse problem. The primary objective of the proposed model is to achieve a good balance between denoising and the preservation of fine details and edges. A convolutional neural network is designed, to approximate both the ground truth image and its gradient using an elastic loss function in its training.
arXiv Detail & Related papers (2024-04-25T08:58:41Z)
Semantic Segmentation Refiner for Ultrasound Applications with Zero-Shot Foundation Models [1.8142288667655782]
We propose a prompt-less segmentation method harnessing the ability of segmentation foundation models to segment abstract shapes. Our method's advantages are brought to light in experiments on a small-scale musculoskeletal ultrasound images dataset.
arXiv Detail & Related papers (2024-04-25T04:21:57Z)
Panoptic Out-of-Distribution Segmentation [11.388678390784195]
We propose Panoptic Out-of Distribution for joint pixel-level semantic in-distribution and out-of-distribution classification with instance prediction. We make the dataset, code, and trained models publicly available at http://pods.cs.uni-freiburg.de.
arXiv Detail & Related papers (2023-10-18T08:38:31Z)
SwIPE: Efficient and Robust Medical Image Segmentation with Implicit Patch Embeddings [12.79344668998054]
We propose SwIPE (Segmentation with Implicit Patch Embeddings) to enable accurate local boundary delineation and global shape coherence. We show that SwIPE significantly improves over recent implicit approaches and outperforms state-of-the-art discrete methods with over 10x fewer parameters.
arXiv Detail & Related papers (2023-07-23T20:55:11Z)
Studying How to Efficiently and Effectively Guide Models with Explanations [52.498055901649025]
'Model guidance' is the idea of regularizing the models' explanations to ensure that they are "right for the right reasons" We conduct an in-depth evaluation across various loss functions, attribution methods, models, and 'guidance depths' on the PASCAL VOC 2007 and MS COCO 2014 datasets. Specifically, we guide the models via bounding box annotations, which are much cheaper to obtain than the commonly used segmentation masks.
arXiv Detail & Related papers (2023-03-21T15:34:50Z)
Interactive Segmentation as Gaussian Process Classification [58.44673380545409]
Click-based interactive segmentation (IS) aims to extract the target objects under user interaction. Most of the current deep learning (DL)-based methods mainly follow the general pipelines of semantic segmentation. We propose to formulate the IS task as a Gaussian process (GP)-based pixel-wise binary classification model on each image.
arXiv Detail & Related papers (2023-02-28T14:01:01Z)
Rethinking Semi-Supervised Medical Image Segmentation: A Variance-Reduction Perspective [51.70661197256033]
We propose ARCO, a semi-supervised contrastive learning framework with stratified group theory for medical image segmentation. We first propose building ARCO through the concept of variance-reduced estimation and show that certain variance-reduction techniques are particularly beneficial in pixel/voxel-level segmentation tasks. We experimentally validate our approaches on eight benchmarks, i.e., five 2D/3D medical and three semantic segmentation datasets, with different label settings.
arXiv Detail & Related papers (2023-02-03T13:50:25Z)
Improving Generalization in Federated Learning by Seeking Flat Minima [23.937135834522145]
Models trained in federated settings often suffer from degraded performances and fail at generalizing. In this work, we investigate such behavior through the lens of geometry of the loss and Hessian eigenspectrum. Motivated by prior studies connecting the sharpness of the loss surface and the generalization gap, we show that i) training clients locally with Sharpness-Aware Minimization (SAM) or its adaptive version (ASAM) on the server-side can substantially improve generalization.
arXiv Detail & Related papers (2022-03-22T16:01:04Z)
A Simple Baseline for Zero-shot Semantic Segmentation with Pre-trained Vision-language Model [61.58071099082296]
It is unclear how to make zero-shot recognition working well on broader vision problems, such as object detection and semantic segmentation. In this paper, we target for zero-shot semantic segmentation, by building it on an off-the-shelf pre-trained vision-language model, i.e., CLIP. Our experimental results show that this simple framework surpasses previous state-of-the-arts by a large margin.
arXiv Detail & Related papers (2021-12-29T18:56:18Z)
Dense Gaussian Processes for Few-Shot Segmentation [66.08463078545306]
We propose a few-shot segmentation method based on dense Gaussian process (GP) regression. We exploit the end-to-end learning capabilities of our approach to learn a high-dimensional output space for the GP. Our approach sets a new state-of-the-art for both 1-shot and 5-shot FSS on the PASCAL-5$i$ and COCO-20$i$ benchmarks.
arXiv Detail & Related papers (2021-10-07T17:57:54Z)
Deep Gaussian Processes for Few-Shot Segmentation [66.08463078545306]
Few-shot segmentation is a challenging task, requiring the extraction of a generalizable representation from only a few annotated samples. We propose a few-shot learner formulation based on Gaussian process (GP) regression. Our approach sets a new state-of-the-art for 5-shot segmentation, with mIoU scores of 68.1 and 49.8 on PASCAL-5i and COCO-20i, respectively.
arXiv Detail & Related papers (2021-03-30T17:56:32Z)
Attention-Based Neural Networks for Chroma Intra Prediction in Video Coding [13.638411611516172]
This work focuses on reducing the complexity of attention-based architectures for chroma intra-prediction. A novel size-agnostic multi-model approach is proposed to reduce the complexity of the inference process. A collection of simplifications is presented in this paper, to further reduce the complexity overhead of the proposed prediction architecture.
arXiv Detail & Related papers (2021-02-09T18:01:22Z)
DONet: Dual Objective Networks for Skin Lesion Segmentation [77.9806410198298]
We propose a simple yet effective framework, named Dual Objective Networks (DONet), to improve the skin lesion segmentation. Our DONet adopts two symmetric decoders to produce different predictions for approaching different objectives. To address the challenge of large variety of lesion scales and shapes in dermoscopic images, we additionally propose a recurrent context encoding module (RCEM)
arXiv Detail & Related papers (2020-08-19T06:02:46Z)
Prior Guided Feature Enrichment Network for Few-Shot Segmentation [64.91560451900125]
State-of-the-art semantic segmentation methods require sufficient labeled data to achieve good results. Few-shot segmentation is proposed to tackle this problem by learning a model that quickly adapts to new classes with a few labeled support samples. Theses frameworks still face the challenge of generalization ability reduction on unseen classes due to inappropriate use of high-level semantic information.
arXiv Detail & Related papers (2020-08-04T10:41:32Z)
The Power of Triply Complementary Priors for Image Compressive Sensing [89.14144796591685]
We propose a joint low-rank deep (LRD) image model, which contains a pair of complementaryly trip priors. We then propose a novel hybrid plug-and-play framework based on the LRD model for image CS. To make the optimization tractable, a simple yet effective algorithm is proposed to solve the proposed H-based image CS problem.
arXiv Detail & Related papers (2020-05-16T08:17:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.