SE-PSNet: Silhouette-based Enhancement Feature for Panoptic Segmentation
Network
- URL: http://arxiv.org/abs/2107.05093v1
- Date: Sun, 11 Jul 2021 17:20:32 GMT
- Title: SE-PSNet: Silhouette-based Enhancement Feature for Panoptic Segmentation
Network
- Authors: Shuo-En Chang, Yi-Cheng Yang, En-Ting Lin, Pei-Yung Hsiao, Li-Chen Fu
- Abstract summary: We propose a solution to tackle the panoptic segmentation task.
The structure combines the bottom-up method and the top-down method.
The network mainly pays attention to the quality of the mask.
- Score: 5.353718408751182
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recently, there has been a panoptic segmentation task combining semantic and
instance segmentation, in which the goal is to classify each pixel with the
corresponding instance ID. In this work, we propose a solution to tackle the
panoptic segmentation task. The overall structure combines the bottom-up method
and the top-down method. Therefore, not only can there be better performance,
but also the execution speed can be maintained. The network mainly pays
attention to the quality of the mask. In the previous work, we can see that the
uneven contour of the object is more likely to appear, resulting in low-quality
prediction. Accordingly, we propose enhancement features and corresponding loss
functions for the silhouette of objects and backgrounds to improve the mask.
Meanwhile, we use the new proposed confidence score to solve the occlusion
problem and make the network tend to use higher quality masks as prediction
results. To verify our research, we used the COCO dataset and CityScapes
dataset to do experiments and obtained competitive results with fast inference
time.
Related papers
- LAC-Net: Linear-Fusion Attention-Guided Convolutional Network for Accurate Robotic Grasping Under the Occlusion [79.22197702626542]
This paper introduces a framework that explores amodal segmentation for robotic grasping in cluttered scenes.
We propose a Linear-fusion Attention-guided Convolutional Network (LAC-Net)
The results on different datasets show that our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-08-06T14:50:48Z) - Location-Aware Self-Supervised Transformers [74.76585889813207]
We propose to pretrain networks for semantic segmentation by predicting the relative location of image parts.
We control the difficulty of the task by masking a subset of the reference patch features visible to those of the query.
Our experiments show that this location-aware pretraining leads to representations that transfer competitively to several challenging semantic segmentation benchmarks.
arXiv Detail & Related papers (2022-12-05T16:24:29Z) - Simple Contrastive Graph Clustering [41.396185271303956]
We propose a Simple Contrastive Graph Clustering (SCGC) algorithm to improve the existing methods.
Our algorithm outperforms the recent contrastive deep clustering competitors with at least seven times speedup on average.
arXiv Detail & Related papers (2022-05-11T06:45:19Z) - PFENet++: Boosting Few-shot Semantic Segmentation with the
Noise-filtered Context-aware Prior Mask [62.37727055343632]
We revisit the prior mask guidance proposed in Guided Feature Enrichment Network for Few-Shot''
We propose the Context-aware Prior Mask (CAPM) that leverages additional nearby semantic cues for better locating the objects in query images.
We take one step further by incorporating a lightweight Noise Suppression Module (NSM) to screen out the unnecessary responses.
arXiv Detail & Related papers (2021-09-28T15:07:43Z) - Self-Supervised Visual Representations Learning by Contrastive Mask
Prediction [129.25459808288025]
We propose a novel contrastive mask prediction (CMP) task for visual representation learning.
MaskCo contrasts region-level features instead of view-level features, which makes it possible to identify the positive sample without any assumptions.
We evaluate MaskCo on training datasets beyond ImageNet and compare its performance with MoCo V2.
arXiv Detail & Related papers (2021-08-18T02:50:33Z) - An Efficient Multitask Neural Network for Face Alignment, Head Pose
Estimation and Face Tracking [9.39854778804018]
We propose an efficient multitask face alignment, face tracking and head pose estimation network (ATPN)
ATPN achieves improved performance compared to previous state-of-the-art methods while having less number of parameters and FLOPS.
arXiv Detail & Related papers (2021-03-13T04:41:15Z) - Spatiotemporal Graph Neural Network based Mask Reconstruction for Video
Object Segmentation [70.97625552643493]
This paper addresses the task of segmenting class-agnostic objects in semi-supervised setting.
We propose a novel graph neuralS network (TG-Net) which captures the local contexts by utilizing all proposals.
arXiv Detail & Related papers (2020-12-10T07:57:44Z) - CRNet: Cross-Reference Networks for Few-Shot Segmentation [59.85183776573642]
Few-shot segmentation aims to learn a segmentation model that can be generalized to novel classes with only a few training images.
With a cross-reference mechanism, our network can better find the co-occurrent objects in the two images.
Experiments on the PASCAL VOC 2012 dataset show that our network achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-03-24T04:55:43Z) - EPSNet: Efficient Panoptic Segmentation Network with Cross-layer
Attention Fusion [5.815742965809424]
We propose an Efficient Panoptic Network (EPSNet) to tackle the panoptic segmentation tasks with fast inference speed.
Basically, EPSNet generates masks based on simple linear combination of prototype masks and mask coefficients.
To enhance the quality of shared prototypes, we adopt a module called "cross-layer attention fusion module"
arXiv Detail & Related papers (2020-03-23T09:11:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.