HAISTA-NET: Human Assisted Instance Segmentation Through Attention
- URL: http://arxiv.org/abs/2305.03105v3
- Date: Fri, 8 Mar 2024 13:30:58 GMT
- Title: HAISTA-NET: Human Assisted Instance Segmentation Through Attention
- Authors: Muhammed Korkmaz, T. Metin Sezgin
- Abstract summary: We propose a novel approach to enable more precise predictions and generate higher-quality segmentation masks.
Our human-assisted segmentation model, HAISTA-NET, augments the existing Strong Mask R-CNN network to incorporate human-specified partial boundaries.
We show that HAISTA-NET outperforms state-of-the art methods such as Mask R-CNN, Strong Mask R-CNN, and Mask2Former.
- Score: 3.073046540587735
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Instance segmentation is a form of image detection which has a range of
applications, such as object refinement, medical image analysis, and
image/video editing, all of which demand a high degree of accuracy. However,
this precision is often beyond the reach of what even state-of-the-art, fully
automated instance segmentation algorithms can deliver. The performance gap
becomes particularly prohibitive for small and complex objects. Practitioners
typically resort to fully manual annotation, which can be a laborious process.
In order to overcome this problem, we propose a novel approach to enable more
precise predictions and generate higher-quality segmentation masks for
high-curvature, complex and small-scale objects. Our human-assisted
segmentation model, HAISTA-NET, augments the existing Strong Mask R-CNN network
to incorporate human-specified partial boundaries. We also present a dataset of
hand-drawn partial object boundaries, which we refer to as human attention
maps. In addition, the Partial Sketch Object Boundaries (PSOB) dataset contains
hand-drawn partial object boundaries which represent curvatures of an object's
ground truth mask with several pixels. Through extensive evaluation using the
PSOB dataset, we show that HAISTA-NET outperforms state-of-the art methods such
as Mask R-CNN, Strong Mask R-CNN, and Mask2Former, achieving respective
increases of +36.7, +29.6, and +26.5 points in AP-Mask metrics for these three
models. We hope that our novel approach will set a baseline for future
human-aided deep learning models by combining fully automated and interactive
instance segmentation architectures.
Related papers
- Contrastive Lift: 3D Object Instance Segmentation by Slow-Fast
Contrastive Fusion [110.84357383258818]
We propose a novel approach to lift 2D segments to 3D and fuse them by means of a neural field representation.
The core of our approach is a slow-fast clustering objective function, which is scalable and well-suited for scenes with a large number of objects.
Our approach outperforms the state-of-the-art on challenging scenes from the ScanNet, Hypersim, and Replica datasets.
arXiv Detail & Related papers (2023-06-07T17:57:45Z) - Exploiting Shape Cues for Weakly Supervised Semantic Segmentation [15.791415215216029]
Weakly supervised semantic segmentation (WSSS) aims to produce pixel-wise class predictions with only image-level labels for training.
We propose to exploit shape information to supplement the texture-biased property of convolutional neural networks (CNNs)
We further refine the predictions in an online fashion with a novel refinement method that takes into account both the class and the color affinities.
arXiv Detail & Related papers (2022-08-08T17:25:31Z) - Discovering Object Masks with Transformers for Unsupervised Semantic
Segmentation [75.00151934315967]
MaskDistill is a novel framework for unsupervised semantic segmentation.
Our framework does not latch onto low-level image cues and is not limited to object-centric datasets.
arXiv Detail & Related papers (2022-06-13T17:59:43Z) - Neural Volumetric Object Selection [126.04480613166194]
We introduce an approach for selecting objects in neural volumetric 3D representations, such as multi-plane images (MPI) and neural radiance fields (NeRF)
Our approach takes a set of foreground and background 2D user scribbles in one view and automatically estimates a 3D segmentation of the desired object, which can be rendered into novel views.
arXiv Detail & Related papers (2022-05-30T08:55:20Z) - SODAR: Segmenting Objects by DynamicallyAggregating Neighboring Mask
Representations [90.8752454643737]
Recent state-of-the-art one-stage instance segmentation model SOLO divides the input image into a grid and directly predicts per grid cell object masks with fully-convolutional networks.
We observe SOLO generates similar masks for an object at nearby grid cells, and these neighboring predictions can complement each other as some may better segment certain object part.
Motivated by the observed gap, we develop a novel learning-based aggregation method that improves upon SOLO by leveraging the rich neighboring information.
arXiv Detail & Related papers (2022-02-15T13:53:03Z) - Robust Instance Segmentation through Reasoning about Multi-Object
Occlusion [9.536947328412198]
We propose a deep network for multi-object instance segmentation that is robust to occlusion.
Our work builds on Compositional Networks, which learn a generative model of neural feature activations to locate occluders.
In particular, we obtain feed-forward predictions of the object classes and their instance and occluder segmentations.
arXiv Detail & Related papers (2020-12-03T17:41:55Z) - Joint Object Contour Points and Semantics for Instance Segmentation [1.2117737635879038]
We propose Mask Point R-CNN aiming at promoting the neural network's attention to the object boundary.
Specifically, we innovatively extend the original human keypoint detection task to the contour point detection of any object.
As a consequence, the model will be more sensitive to the edges of the object and can capture more geometric features.
arXiv Detail & Related papers (2020-08-02T11:11:28Z) - LevelSet R-CNN: A Deep Variational Method for Instance Segmentation [79.20048372891935]
Currently, many state of the art models are based on the Mask R-CNN framework.
We propose LevelSet R-CNN, which combines the best of both worlds by obtaining powerful feature representations.
We demonstrate the effectiveness of our approach on COCO and Cityscapes datasets.
arXiv Detail & Related papers (2020-07-30T17:52:18Z) - CRNet: Cross-Reference Networks for Few-Shot Segmentation [59.85183776573642]
Few-shot segmentation aims to learn a segmentation model that can be generalized to novel classes with only a few training images.
With a cross-reference mechanism, our network can better find the co-occurrent objects in the two images.
Experiments on the PASCAL VOC 2012 dataset show that our network achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-03-24T04:55:43Z) - Weakly Supervised Instance Segmentation by Deep Community Learning [39.18749732409763]
We present a weakly supervised instance segmentation algorithm based on deep community learning with multiple tasks.
We address this problem by designing a unified deep neural network architecture.
The proposed algorithm achieves state-of-the-art performance in the weakly supervised setting.
arXiv Detail & Related papers (2020-01-30T08:35:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.