CFR-ICL: Cascade-Forward Refinement with Iterative Click Loss for
Interactive Image Segmentation
- URL: http://arxiv.org/abs/2303.05620v2
- Date: Mon, 4 Mar 2024 23:05:40 GMT
- Title: CFR-ICL: Cascade-Forward Refinement with Iterative Click Loss for
Interactive Image Segmentation
- Authors: Shoukun Sun, Min Xian, Fei Xu, Luca Capriotti, Tiankai Yao
- Abstract summary: We propose a click-based and mask-guided interactive image segmentation framework containing three novel components.
The proposed framework offers a unified inference framework to generate segmentation results in a coarse-to-fine manner.
Our model reduces by 33.2%, and 15.5% the number of clicks required to surpass an IoU of 0.95 in the previous state-of-the-art approach.
- Score: 2.482735440750151
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The click-based interactive segmentation aims to extract the object of
interest from an image with the guidance of user clicks. Recent work has
achieved great overall performance by employing feedback from the output.
However, in most state-of-the-art approaches, 1) the inference stage involves
inflexible heuristic rules and requires a separate refinement model, and 2) the
number of user clicks and model performance cannot be balanced. To address the
challenges, we propose a click-based and mask-guided interactive image
segmentation framework containing three novel components: Cascade-Forward
Refinement (CFR), Iterative Click Loss (ICL), and SUEM image augmentation. The
CFR offers a unified inference framework to generate segmentation results in a
coarse-to-fine manner. The proposed ICL allows model training to improve
segmentation and reduce user interactions simultaneously. The proposed SUEM
augmentation is a comprehensive way to create large and diverse training sets
for interactive image segmentation. Extensive experiments demonstrate the
state-of-the-art performance of the proposed approach on five public datasets.
Remarkably, our model reduces by 33.2\%, and 15.5\% the number of clicks
required to surpass an IoU of 0.95 in the previous state-of-the-art approach on
the Berkeley and DAVIS sets, respectively.
Related papers
- Towards Self-Supervised FG-SBIR with Unified Sample Feature Alignment and Multi-Scale Token Recycling [11.129453244307369]
FG-SBIR aims to minimize the distance between sketches and corresponding images in the embedding space.
We propose an effective approach to narrow the gap between the two domains.
It mainly facilitates unified mutual information sharing both intra- and inter-samples.
arXiv Detail & Related papers (2024-06-17T13:49:12Z) - Learning from Exemplars for Interactive Image Segmentation [15.37506525730218]
We introduce novel interactive segmentation frameworks for both a single object and multiple objects in the same category.
Our model reduces users' labor by around 15%, requiring two fewer clicks to achieve target IoUs 85% and 90%.
arXiv Detail & Related papers (2024-06-17T12:38:01Z) - IFSENet : Harnessing Sparse Iterations for Interactive Few-shot Segmentation Excellence [2.822194296769473]
Few-shot segmentation techniques reduce the required number of images to learn to segment a new class.
interactive segmentation techniques only focus on incrementally improving the segmentation of one object at a time.
We combine the two concepts to drastically reduce the effort required to train segmentation models for novel classes.
arXiv Detail & Related papers (2024-03-22T10:15:53Z) - Feature Decoupling-Recycling Network for Fast Interactive Segmentation [79.22497777645806]
Recent interactive segmentation methods iteratively take source image, user guidance and previously predicted mask as the input.
We propose the Feature Decoupling-Recycling Network (FDRN), which decouples the modeling components based on their intrinsic discrepancies.
arXiv Detail & Related papers (2023-08-07T12:26:34Z) - Open-vocabulary Panoptic Segmentation with Embedding Modulation [71.15502078615587]
Open-vocabulary image segmentation is attracting increasing attention due to its critical applications in the real world.
Traditional closed-vocabulary segmentation methods are not able to characterize novel objects, whereas several recent open-vocabulary attempts obtain unsatisfactory results.
We propose OPSNet, an omnipotent and data-efficient framework for Open-vocabulary Panopticon.
arXiv Detail & Related papers (2023-03-20T17:58:48Z) - Learning Customized Visual Models with Retrieval-Augmented Knowledge [104.05456849611895]
We propose REACT, a framework to acquire the relevant web knowledge to build customized visual models for target domains.
We retrieve the most relevant image-text pairs from the web-scale database as external knowledge, and propose to customize the model by only training new modualized blocks while freezing all the original weights.
The effectiveness of REACT is demonstrated via extensive experiments on classification, retrieval, detection and segmentation tasks, including zero, few, and full-shot settings.
arXiv Detail & Related papers (2023-01-17T18:59:06Z) - One-Time Model Adaptation to Heterogeneous Clients: An Intra-Client and
Inter-Image Attention Design [40.97593636235116]
We propose a new intra-client and inter-image attention (ICIIA) module into existing backbone recognition models.
In particular, given a target image from a certain client, ICIIA introduces multi-head self-attention to retrieve relevant images from the client's historical unlabeled images.
We evaluate ICIIA using 3 different recognition tasks with 9 backbone models over 5 representative datasets.
arXiv Detail & Related papers (2022-11-11T15:33:21Z) - COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for
Cross-Modal Retrieval [59.15034487974549]
We propose a novel COllaborative Two-Stream vision-language pretraining model termed COTS for image-text retrieval.
Our COTS achieves the highest performance among all two-stream methods and comparable performance with 10,800X faster in inference.
Importantly, our COTS is also applicable to text-to-video retrieval, yielding new state-ofthe-art on the widely-used MSR-VTT dataset.
arXiv Detail & Related papers (2022-04-15T12:34:47Z) - Adversarial Feature Augmentation and Normalization for Visual
Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.
Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings.
We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z) - FAIRS -- Soft Focus Generator and Attention for Robust Object
Segmentation from Extreme Points [70.65563691392987]
We present a new approach to generate object segmentation from user inputs in the form of extreme points and corrective clicks.
We demonstrate our method's ability to generate high-quality training data as well as its scalability in incorporating extreme points, guiding clicks, and corrective clicks in a principled manner.
arXiv Detail & Related papers (2020-04-04T22:25:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.