Real-time instance segmentation with polygons using an
Intersection-over-Union loss
- URL: http://arxiv.org/abs/2305.05490v1
- Date: Tue, 9 May 2023 14:43:38 GMT
- Title: Real-time instance segmentation with polygons using an
Intersection-over-Union loss
- Authors: Katia Jodogne-Del Litto, Guillaume-Alexandre Bilodeau
- Abstract summary: We improve over CenterPoly by enhancing the classical regression L1 loss with a novel region-based loss and a novel order loss.
Experiments show that using a combination of a regression loss and a region-based loss allows significant improvements on the Cityscapes and IDD test set.
- Score: 13.020122353444497
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Predicting a binary mask for an object is more accurate but also more
computationally expensive than a bounding box. Polygonal masks as developed in
CenterPoly can be a good compromise. In this paper, we improve over CenterPoly
by enhancing the classical regression L1 loss with a novel region-based loss
and a novel order loss, as well as with a new training process for the vertices
prediction head. Moreover, the previous methods that predict polygonal masks
use different coordinate systems, but it is not clear if one is better than
another, if we abstract the architecture requirement. We therefore investigate
their impact on the prediction. We also use a new evaluation protocol with
oracle predictions for the detection head, to further isolate the segmentation
process and better compare the polygonal masks with binary masks. Our instance
segmentation method is trained and tested with challenging datasets containing
urban scenes, with a high density of road users. Experiments show, in
particular, that using a combination of a regression loss and a region-based
loss allows significant improvements on the Cityscapes and IDD test set
compared to CenterPoly. Moreover the inference stage remains fast enough to
reach real-time performance with an average of 0.045 s per frame for
2048$\times$1024 images on a single RTX 2070 GPU. The code is available
$\href{https://github.com/KatiaJDL/CenterPoly-v2}{\text{here}}$.
Related papers
- Box2Poly: Memory-Efficient Polygon Prediction of Arbitrarily Shaped and
Rotated Text [27.556486778356014]
Transformer-based text detection techniques have sought to predict polygons.
We present an innovative approach rooted in Sparse R-CNN: a cascade decoding pipeline for polygon prediction.
Our method ensures precision by iteratively refining polygon predictions, considering both the scale and location of preceding results.
arXiv Detail & Related papers (2023-09-20T12:19:07Z) - PolyFormer: Referring Image Segmentation as Sequential Polygon
Generation [20.55281741205142]
Instead of directly predicting pixel-level segmentation masks, the problem of referring image segmentation is formulated as sequential polygon generation.
This is enabled by a new sequence-to-sequence framework, Polygon Transformer (PolyFormer), which takes a sequence of image patches and text query tokens as input.
For more accurate geometric localization, we propose a regression-based decoder, which predicts the precise floating-point coordinates directly.
arXiv Detail & Related papers (2023-02-14T23:00:25Z) - Accurate Polygonal Mapping of Buildings in Satellite Imagery [30.262871819346213]
This paper studies the problem of polygonal mapping of buildings by tackling the issue of mask reversibility.
We propose a novel interaction mechanism of feature embedding sourced from different levels of supervision signals to obtain reversible building masks.
We show that the learned reversible building masks take all the merits of the advances of deep convolutional neural networks for high-performing polygonal mapping of buildings.
arXiv Detail & Related papers (2022-08-01T04:54:55Z) - Neural 3D Scene Reconstruction with the Manhattan-world Assumption [58.90559966227361]
This paper addresses the challenge of reconstructing 3D indoor scenes from multi-view images.
Planar constraints can be conveniently integrated into the recent implicit neural representation-based reconstruction methods.
The proposed method outperforms previous methods by a large margin on 3D reconstruction quality.
arXiv Detail & Related papers (2022-05-05T17:59:55Z) - Planning and Learning with Adaptive Lookahead [74.39132848733847]
Policy Iteration (PI) algorithm alternates between greedy one-step policy improvement and policy evaluation.
Recent literature shows that multi-step lookahead policy improvement leads to a better convergence rate at the expense of increased complexity per iteration.
We propose for the first time to dynamically adapt the multi-step lookahead horizon as a function of the state and of the value estimate.
arXiv Detail & Related papers (2022-01-28T20:26:55Z) - CenterPoly: real-time instance segmentation using bounding polygons [11.365829102707014]
We present a novel method, called CenterPoly, for real-time instance segmentation using bounding polygons.
We apply it to detect road users in dense urban environments, making it suitable for applications in intelligent transportation systems like automated vehicles.
Most of the network parameters are shared by the network heads, making it fast and lightweight enough to run at real-time speed.
arXiv Detail & Related papers (2021-08-19T21:31:30Z) - BoxInst: High-Performance Instance Segmentation with Box Annotations [102.10713189544947]
We present a high-performance method that can achieve mask-level instance segmentation with only bounding-box annotations for training.
Our core idea is to exploit the loss of learning masks in instance segmentation, with no modification to the segmentation network itself.
arXiv Detail & Related papers (2020-12-03T22:27:55Z) - Gaussian Vector: An Efficient Solution for Facial Landmark Detection [3.058685580689605]
This paper proposes a new solution, Gaussian Vector, to preserve the spatial information as well as reduce the output size and simplify the post-processing.
We evaluate our method on 300W, COFW, WFLW and JD-landmark.
arXiv Detail & Related papers (2020-10-03T10:15:41Z) - Towards Accurate Pixel-wise Object Tracking by Attention Retrieval [50.06436600343181]
We propose an attention retrieval network (ARN) to perform soft spatial constraints on backbone features.
We set a new state-of-the-art on recent pixel-wise object tracking benchmark VOT 2020 while running at 40 fps.
arXiv Detail & Related papers (2020-08-06T16:25:23Z) - Enhancing Geometric Factors in Model Learning and Inference for Object
Detection and Instance Segmentation [91.12575065731883]
We propose Complete-IoU (CIoU) loss and Cluster-NMS for enhancing geometric factors in both bounding box regression and Non-Maximum Suppression (NMS)
The training of deep models using CIoU loss results in consistent AP and AR improvements in comparison to widely adopted $ell_n$-norm loss and IoU-based loss.
Cluster-NMS is very efficient due to its pure GPU implementation, and geometric factors can be incorporated to improve both AP and AR.
arXiv Detail & Related papers (2020-05-07T16:00:27Z) - Pixel-in-Pixel Net: Towards Efficient Facial Landmark Detection in the
Wild [104.61677518999976]
We propose Pixel-in-Pixel Net (PIPNet) for facial landmark detection.
The proposed model is equipped with a novel detection head based on heatmap regression.
To further improve the cross-domain generalization capability of PIPNet, we propose self-training with curriculum.
arXiv Detail & Related papers (2020-03-08T12:23:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.