PolyFormer: Referring Image Segmentation as Sequential Polygon
Generation
- URL: http://arxiv.org/abs/2302.07387v2
- Date: Mon, 27 Mar 2023 23:22:31 GMT
- Title: PolyFormer: Referring Image Segmentation as Sequential Polygon
Generation
- Authors: Jiang Liu, Hui Ding, Zhaowei Cai, Yuting Zhang, Ravi Kumar Satzoda,
Vijay Mahadevan, R. Manmatha
- Abstract summary: Instead of directly predicting pixel-level segmentation masks, the problem of referring image segmentation is formulated as sequential polygon generation.
This is enabled by a new sequence-to-sequence framework, Polygon Transformer (PolyFormer), which takes a sequence of image patches and text query tokens as input.
For more accurate geometric localization, we propose a regression-based decoder, which predicts the precise floating-point coordinates directly.
- Score: 20.55281741205142
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, instead of directly predicting the pixel-level segmentation
masks, the problem of referring image segmentation is formulated as sequential
polygon generation, and the predicted polygons can be later converted into
segmentation masks. This is enabled by a new sequence-to-sequence framework,
Polygon Transformer (PolyFormer), which takes a sequence of image patches and
text query tokens as input, and outputs a sequence of polygon vertices
autoregressively. For more accurate geometric localization, we propose a
regression-based decoder, which predicts the precise floating-point coordinates
directly, without any coordinate quantization error. In the experiments,
PolyFormer outperforms the prior art by a clear margin, e.g., 5.40% and 4.52%
absolute improvements on the challenging RefCOCO+ and RefCOCOg datasets. It
also shows strong generalization ability when evaluated on the referring video
segmentation task without fine-tuning, e.g., achieving competitive 61.5% J&F on
the Ref-DAVIS17 dataset.
Related papers
- Box2Poly: Memory-Efficient Polygon Prediction of Arbitrarily Shaped and
Rotated Text [27.556486778356014]
Transformer-based text detection techniques have sought to predict polygons.
We present an innovative approach rooted in Sparse R-CNN: a cascade decoding pipeline for polygon prediction.
Our method ensures precision by iteratively refining polygon predictions, considering both the scale and location of preceding results.
arXiv Detail & Related papers (2023-09-20T12:19:07Z) - Real-time instance segmentation with polygons using an
Intersection-over-Union loss [13.020122353444497]
We improve over CenterPoly by enhancing the classical regression L1 loss with a novel region-based loss and a novel order loss.
Experiments show that using a combination of a regression loss and a region-based loss allows significant improvements on the Cityscapes and IDD test set.
arXiv Detail & Related papers (2023-05-09T14:43:38Z) - Recurrent Generic Contour-based Instance Segmentation with Progressive
Learning [111.31166268300817]
We propose a novel deep network architecture, i.e., PolySnake, for generic contour-based instance segmentation.
Motivated by the classic Snake algorithm, the proposed PolySnake achieves superior and robust segmentation performance.
arXiv Detail & Related papers (2023-01-21T05:34:29Z) - Lesion-aware Dynamic Kernel for Polyp Segmentation [49.63274623103663]
We propose a lesion-aware dynamic network (LDNet) for polyp segmentation.
It is a traditional u-shape encoder-decoder structure incorporated with a dynamic kernel generation and updating scheme.
This simple but effective scheme endows our model with powerful segmentation performance and generalization capability.
arXiv Detail & Related papers (2023-01-12T09:53:57Z) - End-to-End Segmentation via Patch-wise Polygons Prediction [93.91375268580806]
The leading segmentation methods represent the output map as a pixel grid.
We study an alternative representation in which the object edges are modeled, per image patch, as a polygon with $k$ vertices that is coupled with per-patch label probabilities.
arXiv Detail & Related papers (2021-12-05T10:42:40Z) - Segmenter: Transformer for Semantic Segmentation [79.9887988699159]
We introduce Segmenter, a transformer model for semantic segmentation.
We build on the recent Vision Transformer (ViT) and extend it to semantic segmentation.
It outperforms the state of the art on the challenging ADE20K dataset and performs on-par on Pascal Context and Cityscapes.
arXiv Detail & Related papers (2021-05-12T13:01:44Z) - Deep ensembles based on Stochastic Activation Selection for Polyp
Segmentation [82.61182037130406]
This work deals with medical image segmentation and in particular with accurate polyp detection and segmentation during colonoscopy examinations.
Basic architecture in image segmentation consists of an encoder and a decoder.
We compare some variant of the DeepLab architecture obtained by varying the decoder backbone.
arXiv Detail & Related papers (2021-04-02T02:07:37Z) - Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective
with Transformers [149.78470371525754]
We treat semantic segmentation as a sequence-to-sequence prediction task. Specifically, we deploy a pure transformer to encode an image as a sequence of patches.
With the global context modeled in every layer of the transformer, this encoder can be combined with a simple decoder to provide a powerful segmentation model, termed SEgmentation TRansformer (SETR)
SETR achieves new state of the art on ADE20K (50.28% mIoU), Pascal Context (55.83% mIoU) and competitive results on Cityscapes.
arXiv Detail & Related papers (2020-12-31T18:55:57Z) - Polygonal Building Segmentation by Frame Field Learning [37.86051935654666]
We bridge the gap between deep network output and the format used in downstream tasks by adding a frame field output to a deep segmentation model.
We train a deep neural network that aligns a predicted frame field to ground truth contours.
arXiv Detail & Related papers (2020-04-30T15:21:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.