A Large-Scale Benchmark for Food Image Segmentation
- URL: http://arxiv.org/abs/2105.05409v1
- Date: Wed, 12 May 2021 03:00:07 GMT
- Title: A Large-Scale Benchmark for Food Image Segmentation
- Authors: Xiongwei Wu, Xin Fu, Ying Liu, Ee-Peng Lim, Steven C.H. Hoi, Qianru
Sun
- Abstract summary: We build a new food image dataset FoodSeg103 (and its extension FoodSeg154) containing 9,490 images.
We annotate these images with 154 ingredient classes and each image has an average of 6 ingredient labels and pixel-wise masks.
We propose a multi-modality pre-training approach called ReLeM that explicitly equips a segmentation model with rich and semantic food knowledge.
- Score: 62.28029856051079
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Food image segmentation is a critical and indispensible task for developing
health-related applications such as estimating food calories and nutrients.
Existing food image segmentation models are underperforming due to two reasons:
(1) there is a lack of high quality food image datasets with fine-grained
ingredient labels and pixel-wise location masks -- the existing datasets either
carry coarse ingredient labels or are small in size; and (2) the complex
appearance of food makes it difficult to localize and recognize ingredients in
food images, e.g., the ingredients may overlap one another in the same image,
and the identical ingredient may appear distinctly in different food images. In
this work, we build a new food image dataset FoodSeg103 (and its extension
FoodSeg154) containing 9,490 images. We annotate these images with 154
ingredient classes and each image has an average of 6 ingredient labels and
pixel-wise masks. In addition, we propose a multi-modality pre-training
approach called ReLeM that explicitly equips a segmentation model with rich and
semantic food knowledge. In experiments, we use three popular semantic
segmentation methods (i.e., Dilated Convolution based, Feature Pyramid based,
and Vision Transformer based) as baselines, and evaluate them as well as ReLeM
on our new datasets. We believe that the FoodSeg103 (and its extension
FoodSeg154) and the pre-trained models using ReLeM can serve as a benchmark to
facilitate future works on fine-grained food image understanding. We make all
these datasets and methods public at
\url{https://xiongweiwu.github.io/foodseg103.html}.
Related papers
- OVFoodSeg: Elevating Open-Vocabulary Food Image Segmentation via Image-Informed Textual Representation [43.65207396061584]
OVFoodSeg is a framework that enhances text embeddings with visual context.
The training process of OVFoodSeg is divided into two stages: the pre-training of FoodLearner and the subsequent learning phase for segmentation.
By addressing the deficiencies of previous models, OVFoodSeg demonstrates a significant improvement, achieving a 4.9% increase in mean Intersection over Union (mIoU) on the FoodSeg103 dataset.
arXiv Detail & Related papers (2024-04-01T18:26:29Z) - Recognizing Multiple Ingredients in Food Images Using a
Single-Ingredient Classification Model [4.409722014494348]
This study introduces an advanced approach for recognizing ingredients segmented from food images.
The method localizes the candidate regions of the ingredients using the locating and sliding window techniques.
A novel model pruning method is proposed that enhances the efficiency of the classification model.
arXiv Detail & Related papers (2024-01-26T00:46:56Z) - NutritionVerse-Real: An Open Access Manually Collected 2D Food Scene
Dataset for Dietary Intake Estimation [68.49526750115429]
We introduce NutritionVerse-Real, an open access manually collected 2D food scene dataset for dietary intake estimation.
The NutritionVerse-Real dataset was created by manually collecting images of food scenes in real life, measuring the weight of every ingredient and computing the associated dietary content of each dish.
arXiv Detail & Related papers (2023-11-20T11:05:20Z) - Muti-Stage Hierarchical Food Classification [9.013592803864086]
We propose a multi-stage hierarchical framework for food item classification by iteratively clustering and merging food items during the training process.
Our method is evaluated on VFN-nutrient dataset and achieve promising results compared with existing work in terms of both food type and food item classification.
arXiv Detail & Related papers (2023-09-03T04:45:44Z) - Transferring Knowledge for Food Image Segmentation using Transformers
and Convolutions [65.50975507723827]
Food image segmentation is an important task that has ubiquitous applications, such as estimating the nutritional value of a plate of food.
One challenge is that food items can overlap and mix, making them difficult to distinguish.
Two models are trained and compared, one based on convolutional neural networks and the other on Bidirectional representation for Image Transformers (BEiT)
The BEiT model outperforms the previous state-of-the-art model by achieving a mean intersection over union of 49.4 on FoodSeg103.
arXiv Detail & Related papers (2023-06-15T15:38:10Z) - Towards the Creation of a Nutrition and Food Group Based Image Database [58.429385707376554]
We propose a framework to create a nutrition and food group based image database.
We design a protocol for linking food group based food codes in the U.S. Department of Agriculture's (USDA) Food and Nutrient Database for Dietary Studies (FNDDS)
Our proposed method is used to build a nutrition and food group based image database including 16,114 food datasets.
arXiv Detail & Related papers (2022-06-05T02:41:44Z) - Saliency-Aware Class-Agnostic Food Image Segmentation [10.664526852464812]
We propose a class-agnostic food image segmentation method.
Using information from both the before and after eating images, we can segment food images by finding the salient missing objects.
Our method is validated on food images collected from a dietary study.
arXiv Detail & Related papers (2021-02-13T08:05:19Z) - Structure-Aware Generation Network for Recipe Generation from Images [142.047662926209]
We investigate an open research task of generating cooking instructions based on only food images and ingredients.
Target recipes are long-length paragraphs and do not have annotations on structure information.
We propose a novel framework of Structure-aware Generation Network (SGN) to tackle the food recipe generation task.
arXiv Detail & Related papers (2020-09-02T10:54:25Z) - Cross-Modal Food Retrieval: Learning a Joint Embedding of Food Images
and Recipes with Semantic Consistency and Attention Mechanism [70.85894675131624]
We learn an embedding of images and recipes in a common feature space, such that the corresponding image-recipe embeddings lie close to one another.
We propose Semantic-Consistent and Attention-based Networks (SCAN), which regularize the embeddings of the two modalities through aligning output semantic probabilities.
We show that we can outperform several state-of-the-art cross-modal retrieval strategies for food images and cooking recipes by a significant margin.
arXiv Detail & Related papers (2020-03-09T07:41:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.