FastInst: A Simple Query-Based Model for Real-Time Instance Segmentation
- URL: http://arxiv.org/abs/2303.08594v2
- Date: Sat, 1 Apr 2023 17:55:21 GMT
- Title: FastInst: A Simple Query-Based Model for Real-Time Instance Segmentation
- Authors: Junjie He, Pengyu Li, Yifeng Geng, Xuansong Xie
- Abstract summary: We present FastInst, a query-based framework for real-time instance segmentation.
FastInst can execute at a real-time speed (i.e., 32.5 FPS) while yielding an AP of more than 40.
Experiments show that FastInst outperforms most state-of-the-art real-time counterparts.
- Score: 17.551277435319083
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent attention in instance segmentation has focused on query-based models.
Despite being non-maximum suppression (NMS)-free and end-to-end, the
superiority of these models on high-accuracy real-time benchmarks has not been
well demonstrated. In this paper, we show the strong potential of query-based
models on efficient instance segmentation algorithm designs. We present
FastInst, a simple, effective query-based framework for real-time instance
segmentation. FastInst can execute at a real-time speed (i.e., 32.5 FPS) while
yielding an AP of more than 40 (i.e., 40.5 AP) on COCO test-dev without bells
and whistles. Specifically, FastInst follows the meta-architecture of recently
introduced Mask2Former. Its key designs include instance activation-guided
queries, dual-path update strategy, and ground truth mask-guided learning,
which enable us to use lighter pixel decoders, fewer Transformer decoder
layers, while achieving better performance. The experiments show that FastInst
outperforms most state-of-the-art real-time counterparts, including strong
fully convolutional baselines, in both speed and accuracy. Code can be found at
https://github.com/junjiehe96/FastInst .
Related papers
- Efficient Temporal Action Segmentation via Boundary-aware Query Voting [51.92693641176378]
BaFormer is a boundary-aware Transformer network that tokenizes each video segment as an instance token.
BaFormer significantly reduces the computational costs, utilizing only 6% of the running time.
arXiv Detail & Related papers (2024-05-25T00:44:13Z) - Sparse Instance Activation for Real-Time Instance Segmentation [72.23597664935684]
We propose a conceptually novel, efficient, and fully convolutional framework for real-time instance segmentation.
SparseInst has extremely fast inference speed and achieves 40 FPS and 37.9 AP on the COCO benchmark.
arXiv Detail & Related papers (2022-03-24T03:15:39Z) - FastSeq: Make Sequence Generation Faster [20.920579109726024]
We develop FastSeq framework to accelerate sequence generation without accuracy loss.
benchmark results on a set of widely used and diverse models demonstrate 4-9x inference speed gain.
FastSeq is easy to use with a simple one-line code change.
arXiv Detail & Related papers (2021-06-08T22:25:28Z) - When Liebig's Barrel Meets Facial Landmark Detection: A Practical Model [87.25037167380522]
We propose a model that is accurate, robust, efficient, generalizable, and end-to-end trainable.
In order to achieve a better accuracy, we propose two lightweight modules.
DQInit dynamically initializes the queries of decoder from the inputs, enabling the model to achieve as good accuracy as the ones with multiple decoder layers.
QAMem is designed to enhance the discriminative ability of queries on low-resolution feature maps by assigning separate memory values to each query rather than a shared one.
arXiv Detail & Related papers (2021-05-27T13:51:42Z) - QueryInst: Parallelly Supervised Mask Query for Instance Segmentation [53.5613957875507]
We present QueryInst, a query based instance segmentation method driven by parallel supervision on dynamic mask heads.
We conduct extensive experiments on three challenging benchmarks, i.e., COCO, CityScapes, and YouTube-VIS.
QueryInst achieves the best performance among all online VIS approaches and strikes a decent speed-accuracy trade-off.
arXiv Detail & Related papers (2021-05-05T08:38:25Z) - Finding Fast Transformers: One-Shot Neural Architecture Search by
Component Composition [11.6409723227448]
Transformer-based models have achieved stateof-the-art results in many tasks in natural language processing.
We develop an efficient algorithm to search for fast models while maintaining model quality.
arXiv Detail & Related papers (2020-08-15T23:12:25Z) - Approximated Bilinear Modules for Temporal Modeling [116.6506871576514]
Two-layers in CNNs can be converted to temporal bilinear modules by adding an auxiliary-branch sampling.
Our models can outperform most state-of-the-art methods on SomethingSomething v1 and v2 datasets without pretraining.
arXiv Detail & Related papers (2020-07-25T09:07:35Z) - Learning Fast and Robust Target Models for Video Object Segmentation [83.3382606349118]
Video object segmentation (VOS) is a highly challenging problem since the initial mask, defining the target object, is only given at test-time.
Most previous approaches fine-tune segmentation networks on the first frame, resulting in impractical frame-rates and risk of overfitting.
We propose a novel VOS architecture consisting of two network components.
arXiv Detail & Related papers (2020-02-27T21:58:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.