Related papers: MST: Adaptive Multi-Scale Tokens Guided Interactive Segmentation

MST: Adaptive Multi-Scale Tokens Guided Interactive Segmentation

URL: http://arxiv.org/abs/2401.04403v2
Date: Sat, 3 Feb 2024 03:50:42 GMT
Title: MST: Adaptive Multi-Scale Tokens Guided Interactive Segmentation
Authors: Long Xu, Shanghong Li, Yongquan Chen, Jun Luo, Shiwu Lai
Abstract summary: We propose a novel multi-scale token adaptation algorithm for interactive segmentation. By performing top-k operations across multi-scale tokens, the computational complexity is greatly simplified. We also propose a token learning algorithm based on contrastive loss.
Score: 8.46894039954642
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Interactive segmentation has gained significant attention for its application in human-computer interaction and data annotation. To address the target scale variation issue in interactive segmentation, a novel multi-scale token adaptation algorithm is proposed. By performing top-k operations across multi-scale tokens, the computational complexity is greatly simplified while ensuring performance. To enhance the robustness of multi-scale token selection, we also propose a token learning algorithm based on contrastive loss. This algorithm can effectively improve the performance of multi-scale token adaptation. Extensive benchmarking shows that the algorithm achieves state-of-the-art (SOTA) performance, compared to current methods. An interactive demo and all reproducible codes will be released at https://github.com/hahamyt/mst.

Related papers

Evaluation framework for Image Segmentation Algorithms [0.0]
This paper introduces the fundamental concepts and importance of image segmentation, and the role of interactive segmentation in enhancing accuracy. A detailed background theory section explores various segmentation methods, including thresholding, edge detection, region growing, feature extraction, random forests, support vector machines, convolutional neural networks, U-Net, and Mask R-CNN. A comparative analysis presents detailed results, emphasizing the strengths, limitations, and trade-offs of each method.
arXiv Detail & Related papers (2025-04-06T10:20:26Z)
Multi-scale Feature Enhancement in Multi-task Learning for Medical Image Analysis [1.6916040234975798]
Traditional deep learning methods in medical imaging often focus solely on segmentation or classification. We propose a simple yet effective UNet-based MTL model, where features extracted by the encoder are used to predict classification labels, while the decoder produces the segmentation mask. Experimental results across multiple medical datasets confirm the superior performance of our model in both segmentation and classification tasks.
arXiv Detail & Related papers (2024-11-30T04:20:05Z)
Segformer++: Efficient Token-Merging Strategies for High-Resolution Semantic Segmentation [12.249546377051438]
token merging has exhibited remarkable enhancements in inference speed, training efficiency, and memory utilization for image classification tasks. This paper facilitates the deployment of transformer-based architectures on resource-constrained devices and in real-time applications.
arXiv Detail & Related papers (2024-05-23T11:54:27Z)
Semantic Equitable Clustering: A Simple and Effective Strategy for Clustering Vision Tokens [57.37893387775829]
We introduce a fast and balanced clustering method, named textbfSemantic textbfEquitable textbfClustering (SEC) SEC clusters tokens based on their global semantic relevance in an efficient, straightforward manner. We propose a versatile vision backbone, SECViT, to serve as a vision language connector.
arXiv Detail & Related papers (2024-05-22T04:49:00Z)
Subobject-level Image Tokenization [60.80949852899857]
Patch-based image tokenization ignores the morphology of the visual world. Inspired by subword tokenization, we introduce subobject-level adaptive token segmentation. We show that subobject tokenization enables faster convergence and better generalization while using fewer visual tokens.
arXiv Detail & Related papers (2024-02-22T06:47:44Z)
Semantics-Aware Dynamic Localization and Refinement for Referring Image Segmentation [102.25240608024063]
Referring image segments an image from a language expression. We develop an algorithm that shifts from being localization-centric to segmentation-language. Compared to its counterparts, our method is more versatile yet effective.
arXiv Detail & Related papers (2023-03-11T08:42:40Z)
Multi-level Contrast Network for Wearables-based Joint Activity Segmentation and Recognition [10.828099015828693]
Human activity recognition (HAR) with wearables is promising research that can be widely adopted in many smart healthcare applications. Most HAR algorithms are susceptible to the multi-class windows problem that is essential yet rarely exploited. We introduce the segmentation technology into HAR, yielding joint activity segmentation and recognition.
arXiv Detail & Related papers (2022-08-16T05:39:02Z)
CenterCLIP: Token Clustering for Efficient Text-Video Retrieval [67.21528544724546]
In CLIP, the essential visual tokenization process, which produces discrete visual token sequences, generates many homogeneous tokens due to the redundancy nature of consecutive frames in videos. This significantly increases computation costs and hinders the deployment of video retrieval models in web applications. In this paper, we design a multi-segment token clustering algorithm to find the most representative tokens and drop the non-essential ones.
arXiv Detail & Related papers (2022-05-02T12:02:09Z)
Leveraging Auxiliary Tasks with Affinity Learning for Weakly Supervised Semantic Segmentation [88.49669148290306]
We propose a novel weakly supervised multi-task framework called AuxSegNet to leverage saliency detection and multi-label image classification as auxiliary tasks. Inspired by their similar structured semantics, we also propose to learn a cross-task global pixel-level affinity map from the saliency and segmentation representations. The learned cross-task affinity can be used to refine saliency predictions and propagate CAM maps to provide improved pseudo labels for both tasks.
arXiv Detail & Related papers (2021-07-25T11:39:58Z)
Reviving Iterative Training with Mask Guidance for Interactive Segmentation [8.271859911016719]
Recent works on click-based interactive segmentation have demonstrated state-of-the-art results by using various inference-time optimization schemes. We propose a simple feedforward model for click-based interactive segmentation that employs the segmentation masks from previous steps. We find that the models trained on a combination of COCO and LVIS with diverse and high-quality annotations show performance superior to all existing models.
arXiv Detail & Related papers (2021-02-12T15:44:31Z)
Few-shot Sequence Learning with Transformers [79.87875859408955]
Few-shot algorithms aim at learning new tasks provided only a handful of training examples. In this work we investigate few-shot learning in the setting where the data points are sequences of tokens. We propose an efficient learning algorithm based on Transformers.
arXiv Detail & Related papers (2020-12-17T12:30:38Z)
Multi-scale Interactive Network for Salient Object Detection [91.43066633305662]
We propose the aggregate interaction modules to integrate the features from adjacent levels. To obtain more efficient multi-scale features, the self-interaction modules are embedded in each decoder unit. Experimental results on five benchmark datasets demonstrate that the proposed method without any post-processing performs favorably against 23 state-of-the-art approaches.
arXiv Detail & Related papers (2020-07-17T15:41:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.