FoodSAM: Any Food Segmentation
- URL: http://arxiv.org/abs/2308.05938v1
- Date: Fri, 11 Aug 2023 04:42:10 GMT
- Title: FoodSAM: Any Food Segmentation
- Authors: Xing Lan, Jiayi Lyu, Hanyu Jiang, Kun Dong, Zehai Niu, Yi Zhang, Jian
Xue
- Abstract summary: We propose a novel framework, called FoodSAM, to address the lack of class-specific information in SAM-generated masks.
FoodSAM integrates the coarse semantic mask with SAM-generated masks to enhance semantic segmentation quality.
FoodSAM stands as the first-ever work to achieve instance, panoptic, and promptable segmentation on food images.
- Score: 10.467966270491228
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we explore the zero-shot capability of the Segment Anything
Model (SAM) for food image segmentation. To address the lack of class-specific
information in SAM-generated masks, we propose a novel framework, called
FoodSAM. This innovative approach integrates the coarse semantic mask with
SAM-generated masks to enhance semantic segmentation quality. Besides, we
recognize that the ingredients in food can be supposed as independent
individuals, which motivated us to perform instance segmentation on food
images. Furthermore, FoodSAM extends its zero-shot capability to encompass
panoptic segmentation by incorporating an object detector, which renders
FoodSAM to effectively capture non-food object information. Drawing inspiration
from the recent success of promptable segmentation, we also extend FoodSAM to
promptable segmentation, supporting various prompt variants. Consequently,
FoodSAM emerges as an all-encompassing solution capable of segmenting food
items at multiple levels of granularity. Remarkably, this pioneering framework
stands as the first-ever work to achieve instance, panoptic, and promptable
segmentation on food images. Extensive experiments demonstrate the feasibility
and impressing performance of FoodSAM, validating SAM's potential as a
prominent and influential tool within the domain of food image segmentation. We
release our code at https://github.com/jamesjg/FoodSAM.
Related papers
- A SAM based Tool for Semi-Automatic Food Annotation [0.0]
We present a demo of a semi-automatic food image annotation tool leveraging the Segment Anything Model (SAM)
The tool enables prompt-based food segmentation via user interactions, promoting user engagement and allowing them to further categorise food items within meal images.
We also release a fine-tuned version of SAM's mask decoder, dubbed MealSAM, with the ViT-B backbone tailored specifically for food image segmentation.
arXiv Detail & Related papers (2024-10-11T11:50:10Z) - MAS-SAM: Segment Any Marine Animal with Aggregated Features [55.91291540810978]
We propose a novel feature learning framework named MAS-SAM for marine animal segmentation.
Our method enables to extract richer marine information from global contextual cues to fine-grained local details.
arXiv Detail & Related papers (2024-04-24T07:38:14Z) - VRP-SAM: SAM with Visual Reference Prompt [73.05676082695459]
We propose a novel Visual Reference Prompt (VRP) encoder that empowers the Segment Anything Model (SAM) to utilize annotated reference images as prompts for segmentation.
In essence, VRP-SAM can utilize annotated reference images to comprehend specific objects and perform segmentation of specific objects in target image.
arXiv Detail & Related papers (2024-02-27T17:58:09Z) - PA-SAM: Prompt Adapter SAM for High-Quality Image Segmentation [19.65118388712439]
We introduce a novel prompt-driven adapter into SAM, namely Prompt Adapter Segment Anything Model (PA-SAM)
By exclusively training the prompt adapter, PA-SAM extracts detailed information from images and optimize the mask decoder feature at both sparse and dense prompt levels.
Experimental results demonstrate that our PA-SAM outperforms other SAM-based methods in high-quality, zero-shot, and open-set segmentation.
arXiv Detail & Related papers (2024-01-23T19:20:22Z) - FoodLMM: A Versatile Food Assistant using Large Multi-modal Model [96.76271649854542]
Large Multi-modal Models (LMMs) have made impressive progress in many vision-language tasks.
This paper proposes FoodLMM, a versatile food assistant based on LMMs with various capabilities.
We introduce a series of novel task-specific tokens and heads, enabling the model to predict food nutritional values and multiple segmentation masks.
arXiv Detail & Related papers (2023-12-22T11:56:22Z) - Boosting Segment Anything Model Towards Open-Vocabulary Learning [69.42565443181017]
Segment Anything Model (SAM) has emerged as a new paradigmatic vision foundation model.
Despite SAM finding applications and adaptations in various domains, its primary limitation lies in the inability to grasp object semantics.
We present Sambor to seamlessly integrate SAM with the open-vocabulary object detector in an end-to-end framework.
arXiv Detail & Related papers (2023-12-06T17:19:00Z) - Transferring Knowledge for Food Image Segmentation using Transformers
and Convolutions [65.50975507723827]
Food image segmentation is an important task that has ubiquitous applications, such as estimating the nutritional value of a plate of food.
One challenge is that food items can overlap and mix, making them difficult to distinguish.
Two models are trained and compared, one based on convolutional neural networks and the other on Bidirectional representation for Image Transformers (BEiT)
The BEiT model outperforms the previous state-of-the-art model by achieving a mean intersection over union of 49.4 on FoodSeg103.
arXiv Detail & Related papers (2023-06-15T15:38:10Z) - Input Augmentation with SAM: Boosting Medical Image Segmentation with
Segmentation Foundation Model [36.015065439244495]
The Segment Anything Model (SAM) is a recently developed large model for general-purpose segmentation for computer vision tasks.
SAM was trained using 11 million images with over 1 billion masks and can produce segmentation results for a wide range of objects in natural scene images.
This paper shows that although SAM does not immediately give high-quality segmentation for medical image data, its generated masks, features, and stability scores are useful for building and training better medical image segmentation models.
arXiv Detail & Related papers (2023-04-22T07:11:53Z) - SAM Fails to Segment Anything? -- SAM-Adapter: Adapting SAM in
Underperformed Scenes: Camouflage, Shadow, Medical Image Segmentation, and
More [13.047310918166762]
We propose textbfSAM-Adapter, which incorporates domain-specific information or visual prompts into the segmentation network by using simple yet effective adapters.
We can even outperform task-specific network models and achieve state-of-the-art performance in the task we tested: camouflaged object detection.
arXiv Detail & Related papers (2023-04-18T17:38:54Z) - A Large-Scale Benchmark for Food Image Segmentation [62.28029856051079]
We build a new food image dataset FoodSeg103 (and its extension FoodSeg154) containing 9,490 images.
We annotate these images with 154 ingredient classes and each image has an average of 6 ingredient labels and pixel-wise masks.
We propose a multi-modality pre-training approach called ReLeM that explicitly equips a segmentation model with rich and semantic food knowledge.
arXiv Detail & Related papers (2021-05-12T03:00:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.