Related papers: Real-time instance segmentation with polygons using an Intersection-over-Union loss

Real-time instance segmentation with polygons using an Intersection-over-Union loss

URL: http://arxiv.org/abs/2305.05490v1
Date: Tue, 9 May 2023 14:43:38 GMT
Title: Real-time instance segmentation with polygons using an Intersection-over-Union loss
Authors: Katia Jodogne-Del Litto, Guillaume-Alexandre Bilodeau
Abstract summary: We improve over CenterPoly by enhancing the classical regression L1 loss with a novel region-based loss and a novel order loss. Experiments show that using a combination of a regression loss and a region-based loss allows significant improvements on the Cityscapes and IDD test set.
Score: 13.020122353444497
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Predicting a binary mask for an object is more accurate but also more computationally expensive than a bounding box. Polygonal masks as developed in CenterPoly can be a good compromise. In this paper, we improve over CenterPoly by enhancing the classical regression L1 loss with a novel region-based loss and a novel order loss, as well as with a new training process for the vertices prediction head. Moreover, the previous methods that predict polygonal masks use different coordinate systems, but it is not clear if one is better than another, if we abstract the architecture requirement. We therefore investigate their impact on the prediction. We also use a new evaluation protocol with oracle predictions for the detection head, to further isolate the segmentation process and better compare the polygonal masks with binary masks. Our instance segmentation method is trained and tested with challenging datasets containing urban scenes, with a high density of road users. Experiments show, in particular, that using a combination of a regression loss and a region-based loss allows significant improvements on the Cityscapes and IDD test set compared to CenterPoly. Moreover the inference stage remains fast enough to reach real-time performance with an average of 0.045 s per frame for 2048$\times$1024 images on a single RTX 2070 GPU. The code is available $\href{https://github.com/KatiaJDL/CenterPoly-v2}{\text{here}}$.

Related papers

High-Frequency Prior-Driven Adaptive Masking for Accelerating Image Super-Resolution [87.56382172827526]
High-frequency regions are most critical for reconstruction.<n>We propose a training-free adaptive masking module for acceleration.<n>Our method reduces FLOPs by 24--43% for state-of-the-art models.
arXiv Detail & Related papers (2025-05-11T13:18:03Z)
Efficient Masked AutoEncoder for Video Object Counting and A Large-Scale Benchmark [52.339936954958034]
The dynamic imbalance of the fore-background is a major challenge in video object counting. We propose a density-embedded Efficient Masked Autoencoder Counting (E-MAC) framework in this paper. In addition, we first propose a large video bird counting dataset, DroneBird, in natural scenarios for migratory bird protection.
arXiv Detail & Related papers (2024-11-20T06:08:21Z)
Box2Poly: Memory-Efficient Polygon Prediction of Arbitrarily Shaped and Rotated Text [27.556486778356014]
Transformer-based text detection techniques have sought to predict polygons. We present an innovative approach rooted in Sparse R-CNN: a cascade decoding pipeline for polygon prediction. Our method ensures precision by iteratively refining polygon predictions, considering both the scale and location of preceding results.
arXiv Detail & Related papers (2023-09-20T12:19:07Z)
PolyFormer: Referring Image Segmentation as Sequential Polygon Generation [20.55281741205142]
Instead of directly predicting pixel-level segmentation masks, the problem of referring image segmentation is formulated as sequential polygon generation. This is enabled by a new sequence-to-sequence framework, Polygon Transformer (PolyFormer), which takes a sequence of image patches and text query tokens as input. For more accurate geometric localization, we propose a regression-based decoder, which predicts the precise floating-point coordinates directly.
arXiv Detail & Related papers (2023-02-14T23:00:25Z)
Accurate Polygonal Mapping of Buildings in Satellite Imagery [30.262871819346213]
This paper studies the problem of polygonal mapping of buildings by tackling the issue of mask reversibility. We propose a novel interaction mechanism of feature embedding sourced from different levels of supervision signals to obtain reversible building masks. We show that the learned reversible building masks take all the merits of the advances of deep convolutional neural networks for high-performing polygonal mapping of buildings.
arXiv Detail & Related papers (2022-08-01T04:54:55Z)
Neural 3D Scene Reconstruction with the Manhattan-world Assumption [58.90559966227361]
This paper addresses the challenge of reconstructing 3D indoor scenes from multi-view images. Planar constraints can be conveniently integrated into the recent implicit neural representation-based reconstruction methods. The proposed method outperforms previous methods by a large margin on 3D reconstruction quality.
arXiv Detail & Related papers (2022-05-05T17:59:55Z)
Planning and Learning with Adaptive Lookahead [74.39132848733847]
Policy Iteration (PI) algorithm alternates between greedy one-step policy improvement and policy evaluation. Recent literature shows that multi-step lookahead policy improvement leads to a better convergence rate at the expense of increased complexity per iteration. We propose for the first time to dynamically adapt the multi-step lookahead horizon as a function of the state and of the value estimate.
arXiv Detail & Related papers (2022-01-28T20:26:55Z)
CenterPoly: real-time instance segmentation using bounding polygons [11.365829102707014]
We present a novel method, called CenterPoly, for real-time instance segmentation using bounding polygons. We apply it to detect road users in dense urban environments, making it suitable for applications in intelligent transportation systems like automated vehicles. Most of the network parameters are shared by the network heads, making it fast and lightweight enough to run at real-time speed.
arXiv Detail & Related papers (2021-08-19T21:31:30Z)
BoxInst: High-Performance Instance Segmentation with Box Annotations [102.10713189544947]
We present a high-performance method that can achieve mask-level instance segmentation with only bounding-box annotations for training. Our core idea is to exploit the loss of learning masks in instance segmentation, with no modification to the segmentation network itself.
arXiv Detail & Related papers (2020-12-03T22:27:55Z)
Gaussian Vector: An Efficient Solution for Facial Landmark Detection [3.058685580689605]
This paper proposes a new solution, Gaussian Vector, to preserve the spatial information as well as reduce the output size and simplify the post-processing. We evaluate our method on 300W, COFW, WFLW and JD-landmark.
arXiv Detail & Related papers (2020-10-03T10:15:41Z)
Towards Accurate Pixel-wise Object Tracking by Attention Retrieval [50.06436600343181]
We propose an attention retrieval network (ARN) to perform soft spatial constraints on backbone features. We set a new state-of-the-art on recent pixel-wise object tracking benchmark VOT 2020 while running at 40 fps.
arXiv Detail & Related papers (2020-08-06T16:25:23Z)
Enhancing Geometric Factors in Model Learning and Inference for Object Detection and Instance Segmentation [91.12575065731883]
We propose Complete-IoU (CIoU) loss and Cluster-NMS for enhancing geometric factors in both bounding box regression and Non-Maximum Suppression (NMS) The training of deep models using CIoU loss results in consistent AP and AR improvements in comparison to widely adopted $ell_n$-norm loss and IoU-based loss. Cluster-NMS is very efficient due to its pure GPU implementation, and geometric factors can be incorporated to improve both AP and AR.
arXiv Detail & Related papers (2020-05-07T16:00:27Z)
Pixel-in-Pixel Net: Towards Efficient Facial Landmark Detection in the Wild [104.61677518999976]
We propose Pixel-in-Pixel Net (PIPNet) for facial landmark detection. The proposed model is equipped with a novel detection head based on heatmap regression. To further improve the cross-domain generalization capability of PIPNet, we propose self-training with curriculum.
arXiv Detail & Related papers (2020-03-08T12:23:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.