SMITE: Segment Me In TimE
- URL: http://arxiv.org/abs/2410.18538v1
- Date: Thu, 24 Oct 2024 08:38:20 GMT
- Title: SMITE: Segment Me In TimE
- Authors: Amirhossein Alimohammadi, Sauradip Nag, Saeid Asgari Taghanaki, Andrea Tagliasacchi, Ghassan Hamarneh, Ali Mahdavi Amiri,
- Abstract summary: We show how to segment an object in a video by employing a pre-trained text to image diffusion model and an additional tracking mechanism.
We demonstrate that our approach can effectively manage various segmentation scenarios and outperforms state-of-the-art alternatives.
- Score: 35.56475607621353
- License:
- Abstract: Segmenting an object in a video presents significant challenges. Each pixel must be accurately labelled, and these labels must remain consistent across frames. The difficulty increases when the segmentation is with arbitrary granularity, meaning the number of segments can vary arbitrarily, and masks are defined based on only one or a few sample images. In this paper, we address this issue by employing a pre-trained text to image diffusion model supplemented with an additional tracking mechanism. We demonstrate that our approach can effectively manage various segmentation scenarios and outperforms state-of-the-art alternatives.
Related papers
- Open-Vocabulary Segmentation with Unpaired Mask-Text Supervision [87.15580604023555]
Unpair-Seg is a novel weakly-supervised open-vocabulary segmentation framework.
It learns from unpaired image-mask and image-text pairs, which can be independently and efficiently collected.
It achieves 14.6% and 19.5% mIoU on the ADE-847 and PASCAL Context-459 datasets.
arXiv Detail & Related papers (2024-02-14T06:01:44Z) - Unsupervised Universal Image Segmentation [59.0383635597103]
We propose an Unsupervised Universal model (U2Seg) adept at performing various image segmentation tasks.
U2Seg generates pseudo semantic labels for these segmentation tasks via leveraging self-supervised models.
We then self-train the model on these pseudo semantic labels, yielding substantial performance gains.
arXiv Detail & Related papers (2023-12-28T18:59:04Z) - SegRefiner: Towards Model-Agnostic Segmentation Refinement with Discrete
Diffusion Process [102.18226145874007]
We propose a model-agnostic solution called SegRefiner to enhance the quality of object masks produced by different segmentation models.
SegRefiner takes coarse masks as inputs and refines them using a discrete diffusion process.
It consistently improves both the segmentation metrics and boundary metrics across different types of coarse masks.
arXiv Detail & Related papers (2023-12-19T18:53:47Z) - Synthetic Instance Segmentation from Semantic Image Segmentation Masks [15.477053085267404]
We propose a novel paradigm called Synthetic Instance (SISeg)
SISeg instance segmentation results by leveraging image masks generated by existing semantic segmentation models.
In other words, the proposed model does not need extra manpower or higher computational expenses.
arXiv Detail & Related papers (2023-08-02T05:13:02Z) - Generative Semantic Segmentation [40.57488730457299]
We present a generative learning approach for semantic segmentation.
Uniquely, we cast semantic segmentation as an image-conditioned mask generation problem.
Experiments show that our GSS can perform competitively to prior art alternatives in the standard semantic segmentation setting.
arXiv Detail & Related papers (2023-03-20T17:55:37Z) - A Generalist Framework for Panoptic Segmentation of Images and Videos [61.61453194912186]
We formulate panoptic segmentation as a discrete data generation problem, without relying on inductive bias of the task.
A diffusion model is proposed to model panoptic masks, with a simple architecture and generic loss function.
Our method is capable of modeling video (in a streaming setting) and thereby learns to track object instances automatically.
arXiv Detail & Related papers (2022-10-12T16:18:25Z) - Tag-Based Attention Guided Bottom-Up Approach for Video Instance
Segmentation [83.13610762450703]
Video instance is a fundamental computer vision task that deals with segmenting and tracking object instances across a video sequence.
We introduce a simple end-to-end train bottomable-up approach to achieve instance mask predictions at the pixel-level granularity, instead of the typical region-proposals-based approach.
Our method provides competitive results on YouTube-VIS and DAVIS-19 datasets, and has minimum run-time compared to other contemporary state-of-the-art performance methods.
arXiv Detail & Related papers (2022-04-22T15:32:46Z) - Semantic Segmentation In-the-Wild Without Seeing Any Segmentation
Examples [34.97652735163338]
We propose a novel approach for creating semantic segmentation masks for every object.
Our method takes as input the image-level labels of the class categories present in the image.
The output of this stage provides pixel-level pseudo-labels, instead of the manual pixel-level labels required by supervised methods.
arXiv Detail & Related papers (2021-12-06T17:32:38Z) - Segmenter: Transformer for Semantic Segmentation [79.9887988699159]
We introduce Segmenter, a transformer model for semantic segmentation.
We build on the recent Vision Transformer (ViT) and extend it to semantic segmentation.
It outperforms the state of the art on the challenging ADE20K dataset and performs on-par on Pascal Context and Cityscapes.
arXiv Detail & Related papers (2021-05-12T13:01:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.