Related papers: INSTA-YOLO: Real-Time Instance Segmentation

INSTA-YOLO: Real-Time Instance Segmentation

URL: http://arxiv.org/abs/2102.06777v1
Date: Fri, 12 Feb 2021 21:17:29 GMT
Title: INSTA-YOLO: Real-Time Instance Segmentation
Authors: Eslam Mohamed, Abdelrahman Shaker, Hazem Rashed, Ahmad El-Sallab, Mayada Hadhoud
Abstract summary: We propose Insta-YOLO, a novel one-stage end-to-end deep learning model for real-time instance segmentation. Instead of pixel-wise prediction, our model predicts instances as object contours represented by 2D points in Cartesian space. We evaluate our model on three datasets, namely, Carvana,Cityscapes and Airbus.
Score: 2.9769485817170387
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Instance segmentation has gained recently huge attention in various computer vision applications. It aims at providing different IDs to different objects of the scene, even if they belong to the same class. Instance segmentation is usually performed as a two-stage pipeline. First, an object is detected, then semantic segmentation within the detected box area is performed which involves costly up-sampling. In this paper, we propose Insta-YOLO, a novel one-stage end-to-end deep learning model for real-time instance segmentation. Instead of pixel-wise prediction, our model predicts instances as object contours represented by 2D points in Cartesian space. We evaluate our model on three datasets, namely, Carvana,Cityscapes and Airbus. We compare our results to the state-of-the-art models for instance segmentation. The results show our model achieves competitive accuracy in terms of mAP at twice the speed on GTX-1080 GPU.

Related papers

You Only Estimate Once: Unified, One-stage, Real-Time Category-level Articulated Object 6D Pose Estimation for Robotic Grasping [119.41166438439313]
YOEO is a single-stage method that outputs instance segmentation and NPCS representations in an end-to-end manner.<n>We use a unified network to generate point-wise semantic labels and centroid offsets, allowing points from the same part instance to vote for the same centroid.<n>We also deploy our synthetically-trained model in a real-world setting, providing real-time visual feedback at 200Hz.
arXiv Detail & Related papers (2025-06-06T03:49:20Z)
SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding [56.079013202051094]
We present SegVG, a novel method transfers the box-level annotation as signals to provide an additional pixel-level supervision for Visual Grounding. This approach allows us to iteratively exploit the annotation as signals for both box-level regression and pixel-level segmentation.
arXiv Detail & Related papers (2024-07-03T15:30:45Z)
Appearance-Based Refinement for Object-Centric Motion Segmentation [85.2426540999329]
We introduce an appearance-based refinement method that leverages temporal consistency in video streams to correct inaccurate flow-based proposals. Our approach involves a sequence-level selection mechanism that identifies accurate flow-predicted masks as exemplars. Its performance is evaluated on multiple video segmentation benchmarks, including DAVIS, YouTube, SegTrackv2, and FBMS-59.
arXiv Detail & Related papers (2023-12-18T18:59:51Z)
CASAPose: Class-Adaptive and Semantic-Aware Multi-Object Pose Estimation [2.861848675707602]
We present a new single-stage architecture called CASAPose. It determines 2D-3D correspondences for pose estimation of multiple different objects in RGB images in one pass. It is fast and memory efficient, and achieves high accuracy for multiple objects.
arXiv Detail & Related papers (2022-10-11T10:20:01Z)
Segmenting Moving Objects via an Object-Centric Layered Representation [100.26138772664811]
We introduce an object-centric segmentation model with a depth-ordered layer representation. We introduce a scalable pipeline for generating synthetic training data with multiple objects. We evaluate the model on standard video segmentation benchmarks.
arXiv Detail & Related papers (2022-07-05T17:59:43Z)
Human Instance Segmentation and Tracking via Data Association and Single-stage Detector [17.46922710432633]
Human video instance segmentation plays an important role in computer understanding of human activities. Most current VIS methods are based on Mask-RCNN framework. We develop a new method for human video instance segmentation based on single-stage detector.
arXiv Detail & Related papers (2022-03-31T11:36:09Z)
Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation [95.74244714914052]
Multiple object tracking and segmentation requires detecting, tracking, and segmenting objects belonging to a set of given classes. We propose Prototypical Cross-Attention Network (PCAN), capable of leveraging rich-temporal information online. PCAN outperforms current video instance tracking and segmentation competition winners on Youtube-VIS and BDD100K datasets.
arXiv Detail & Related papers (2021-06-22T17:57:24Z)
Learning to Associate Every Segment for Video Panoptic Segmentation [123.03617367709303]
We learn coarse segment-level matching and fine pixel-level matching together. We show that our per-frame computation model can achieve new state-of-the-art results on Cityscapes-VPS and VIPER datasets.
arXiv Detail & Related papers (2021-06-17T13:06:24Z)
Enhanced Boundary Learning for Glass-like Object Segmentation [55.45473926510806]
This paper aims to solve the glass-like object segmentation problem via enhanced boundary learning. In particular, we first propose a novel refined differential module for generating finer boundary cues. An edge-aware point-based graph convolution network module is proposed to model the global shape representation along the boundary.
arXiv Detail & Related papers (2021-03-29T16:18:57Z)
Monocular Instance Motion Segmentation for Autonomous Driving: KITTI InstanceMotSeg Dataset and Multi-task Baseline [5.000331633798637]
Moving object segmentation is a crucial task for autonomous vehicles as it can be used to segment objects in a class agnostic manner. Although pixel-wise motion segmentation has been studied in autonomous driving literature, it has been rarely addressed at the instance level. We create a new InstanceMotSeg dataset comprising of 12.9K samples improving upon our KITTIMoSeg dataset.
arXiv Detail & Related papers (2020-08-16T21:47:09Z)
EOLO: Embedded Object Segmentation only Look Once [0.0]
We introduce an anchor-free and single-shot instance segmentation method, which is conceptually simple with 3 independent branches, fully convolutional and can be used by easily embedding it into mobile and embedded devices. Our method, refer as EOLO, reformulates the instance segmentation problem as predicting semantic segmentation and distinguishing overlapping objects problem, through instance center classification and 4D distance regression on each pixel. Without any bells and whistles, EOLO achieves 27.7$%$ in mask mAP under IoU50 and reaches 30 FPS on 1080Ti GPU, with a single-model and single-scale training/testing on
arXiv Detail & Related papers (2020-03-31T21:22:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.