TongueSAM: An Universal Tongue Segmentation Model Based on SAM with
Zero-Shot
- URL: http://arxiv.org/abs/2308.06444v3
- Date: Wed, 6 Dec 2023 02:11:15 GMT
- Title: TongueSAM: An Universal Tongue Segmentation Model Based on SAM with
Zero-Shot
- Authors: Shan Cao, Qunsheng Ruan and Linjian Ma
- Abstract summary: Tongue segmentation serves as the primary step in automated TCM tongue diagnosis.
This paper proposes a universal tongue segmentation model named TongueSAM based on SAM (Segment Anything Model)
- Score: 8.211898244288305
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Tongue segmentation serves as the primary step in automated TCM tongue
diagnosis, which plays a significant role in the diagnostic results. Currently,
numerous deep learning based methods have achieved promising results. However,
when confronted with tongue images that differ from the training set or possess
challenging backgrounds, these methods demonstrate limited performance. To
address this issue, this paper proposes a universal tongue segmentation model
named TongueSAM based on SAM (Segment Anything Model). SAM is a large-scale
pretrained interactive segmentation model known for its powerful zero-shot
generalization capability. Applying SAM to tongue segmentation leverages its
learned prior knowledge from natural images, enabling the achievement of
zero-shot segmentation for various types of tongue images. In this study, a
Prompt Generator based on object detection is integrated into SAM to enable an
end-to-end automated tongue segmentation method. Experiments demonstrate that
TongueSAM achieves exceptional performance across various of tongue
segmentation datasets, particularly under zero-shot. Even when dealing with
challenging background tongue images, TongueSAM achieves a mIoU of 95.23\%
under zero-shot conditions, surpassing other segmentation methods. As far as we
know, this is the first application of large-scale pretrained model for tongue
segmentation. The project mentioned in this paper is currently publicly
available.
Related papers
- Boosting Segment Anything Model Towards Open-Vocabulary Learning [69.42565443181017]
Segment Anything Model (SAM) has emerged as a new paradigmatic vision foundation model.
Despite SAM finding applications and adaptations in various domains, its primary limitation lies in the inability to grasp object semantics.
We present Sambor to seamlessly integrate SAM with the open-vocabulary object detector in an end-to-end framework.
arXiv Detail & Related papers (2023-12-06T17:19:00Z) - Segment and Caption Anything [126.20201216616137]
We propose a method to efficiently equip the Segment Anything Model with the ability to generate regional captions.
By introducing a lightweight query-based feature mixer, we align the region-specific features with the embedding space of language models for later caption generation.
We conduct extensive experiments to demonstrate the superiority of our method and validate each design choice.
arXiv Detail & Related papers (2023-12-01T19:00:17Z) - Self-guided Few-shot Semantic Segmentation for Remote Sensing Imagery
Based on Large Vision Models [14.292149307183967]
This research introduces a structured framework designed for the automation of few-shot semantic segmentation.
It utilizes the SAM model and facilitates a more efficient generation of semantically discernible segmentation outcomes.
Central to our methodology is a novel automatic prompt learning approach, leveraging prior guided masks to produce coarse pixel-wise prompts for SAM.
arXiv Detail & Related papers (2023-11-22T07:07:55Z) - Zero-Shot Segmentation of Eye Features Using the Segment Anything Model (SAM) [8.529233820032678]
The Segment Anything Model (SAM) is the first foundation model for image segmentation.
In this study, we evaluate SAM's ability to segment features from eye images recorded in virtual reality setups.
Our investigation centers on SAM's zero-shot learning abilities and the effectiveness of prompts like bounding boxes or point clicks.
arXiv Detail & Related papers (2023-11-14T11:05:08Z) - LISA: Reasoning Segmentation via Large Language Model [68.24075852136761]
We propose a new segmentation task -- reasoning segmentation.
The task is designed to output a segmentation mask given a complex and implicit query text.
We present LISA: large Language Instructed Assistant, which inherits the language generation capabilities of multimodal Large Language Models.
arXiv Detail & Related papers (2023-08-01T17:50:17Z) - Input Augmentation with SAM: Boosting Medical Image Segmentation with
Segmentation Foundation Model [36.015065439244495]
The Segment Anything Model (SAM) is a recently developed large model for general-purpose segmentation for computer vision tasks.
SAM was trained using 11 million images with over 1 billion masks and can produce segmentation results for a wide range of objects in natural scene images.
This paper shows that although SAM does not immediately give high-quality segmentation for medical image data, its generated masks, features, and stability scores are useful for building and training better medical image segmentation models.
arXiv Detail & Related papers (2023-04-22T07:11:53Z) - SAM.MD: Zero-shot medical image segmentation capabilities of the Segment
Anything Model [1.1221592576472588]
We evaluate the zero-shot capabilities of the Segment Anything Model for medical image segmentation.
We show that SAM generalizes well to CT data, making it a potential catalyst for the advancement of semi-automatic segmentation tools.
arXiv Detail & Related papers (2023-04-10T18:20:29Z) - Segment Anything Model (SAM) for Digital Pathology: Assess Zero-shot
Segmentation on Whole Slide Imaging [12.533476185972527]
The segment anything model (SAM) was released as a foundation model for image segmentation.
We evaluate the zero-shot segmentation performance of SAM model on representative segmentation tasks on whole slide imaging (WSI)
The results suggest that the zero-shot SAM model achieves remarkable segmentation performance for large connected objects.
arXiv Detail & Related papers (2023-04-09T04:06:59Z) - M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for
Multilingual Speech to Image Retrieval [56.49878599920353]
This work investigates the use of large-scale, English-only pre-trained models (CLIP and HuBERT) for multilingual image-speech retrieval.
For non-English image-speech retrieval, we outperform the current state-of-the-art performance by a wide margin both when training separate models for each language, and with a single model which processes speech in all three languages.
arXiv Detail & Related papers (2022-11-02T14:54:45Z) - Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types.
Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z) - Multimodal Knowledge Alignment with Reinforcement Learning [103.68816413817372]
ESPER extends language-only zero-shot models to unseen multimodal tasks, like image and audio captioning.
Our key novelty is to use reinforcement learning to align multimodal inputs to language model generations without direct supervision.
Experiments demonstrate that ESPER outperforms baselines and prior work on a variety of zero-shot tasks.
arXiv Detail & Related papers (2022-05-25T10:12:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.