Recognizing Multiple Ingredients in Food Images Using a
Single-Ingredient Classification Model
- URL: http://arxiv.org/abs/2401.14579v3
- Date: Mon, 19 Feb 2024 01:43:00 GMT
- Title: Recognizing Multiple Ingredients in Food Images Using a
Single-Ingredient Classification Model
- Authors: Kun Fu, and Ying Dai
- Abstract summary: This study introduces an advanced approach for recognizing ingredients segmented from food images.
The method localizes the candidate regions of the ingredients using the locating and sliding window techniques.
A novel model pruning method is proposed that enhances the efficiency of the classification model.
- Score: 4.409722014494348
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recognizing food images presents unique challenges due to the variable
spatial layout and shape changes of ingredients with different cooking and
cutting methods. This study introduces an advanced approach for recognizing
ingredients segmented from food images. The method localizes the candidate
regions of the ingredients using the locating and sliding window techniques.
Then, these regions are assigned into ingredient classes using a CNN
(Convolutional Neural Network)-based single-ingredient classification model
trained on a dataset of single-ingredient images. To address the challenge of
processing speed in multi-ingredient recognition, a novel model pruning method
is proposed that enhances the efficiency of the classification model.
Subsequently, the multi-ingredient identification is achieved through a
decision-making scheme, incorporating two novel algorithms. The
single-ingredient image dataset, designed in accordance with the book entitled
"New Food Ingredients List FOODS 2021", encompasses 9982 images across 110
diverse categories, emphasizing variety in ingredient shapes. In addition, a
multi-ingredient image dataset is developed to rigorously evaluate the
performance of our approach. Experimental results validate the effectiveness of
our method, particularly highlighting its improved capability in recognizing
multiple ingredients. This marks a significant advancement in the field of food
image analysis.
Related papers
- Foodfusion: A Novel Approach for Food Image Composition via Diffusion Models [48.821150379374714]
We introduce a large-scale, high-quality food image composite dataset, FC22k, which comprises 22,000 foreground, background, and ground truth ternary image pairs.
We propose a novel food image composition method, Foodfusion, which incorporates a Fusion Module for processing and integrating foreground and background information.
arXiv Detail & Related papers (2024-08-26T09:32:16Z) - Transferring Knowledge for Food Image Segmentation using Transformers
and Convolutions [65.50975507723827]
Food image segmentation is an important task that has ubiquitous applications, such as estimating the nutritional value of a plate of food.
One challenge is that food items can overlap and mix, making them difficult to distinguish.
Two models are trained and compared, one based on convolutional neural networks and the other on Bidirectional representation for Image Transformers (BEiT)
The BEiT model outperforms the previous state-of-the-art model by achieving a mean intersection over union of 49.4 on FoodSeg103.
arXiv Detail & Related papers (2023-06-15T15:38:10Z) - A Large-Scale Benchmark for Food Image Segmentation [62.28029856051079]
We build a new food image dataset FoodSeg103 (and its extension FoodSeg154) containing 9,490 images.
We annotate these images with 154 ingredient classes and each image has an average of 6 ingredient labels and pixel-wise masks.
We propose a multi-modality pre-training approach called ReLeM that explicitly equips a segmentation model with rich and semantic food knowledge.
arXiv Detail & Related papers (2021-05-12T03:00:07Z) - Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers
and Self-supervised Learning [17.42688184238741]
Cross-modal recipe retrieval has recently gained substantial attention due to the importance of food in people's lives.
We propose a simplified end-to-end model based on well established and high performing encoders for text and images.
Our proposed method achieves state-of-the-art performance in the cross-modal recipe retrieval task on the Recipe1M dataset.
arXiv Detail & Related papers (2021-03-24T10:17:09Z) - Visual Aware Hierarchy Based Food Recognition [10.194167945992938]
We propose a new two-step food recognition system using Convolutional Neural Networks (CNNs) as the backbone architecture.
The food localization step is based on an implementation of the Faster R-CNN method to identify food regions.
In the food classification step, visually similar food categories can be clustered together automatically to generate a hierarchical structure.
arXiv Detail & Related papers (2020-12-06T20:25:31Z) - Cross-modal Retrieval and Synthesis (X-MRS): Closing the modality gap in
shared subspace [21.33710150033949]
We propose a simple yet novel architecture for shared subspace learning, which is used to tackle the food image-to-recipe retrieval problem.
Experimental analysis on the public Recipe1M dataset shows that the subspace learned via the proposed method outperforms the current state-of-the-arts.
In order to demonstrate the representational power of the learned subspace, we propose a generative food image synthesis model conditioned on the embeddings of recipes.
arXiv Detail & Related papers (2020-12-02T17:27:00Z) - Structure-Aware Generation Network for Recipe Generation from Images [142.047662926209]
We investigate an open research task of generating cooking instructions based on only food images and ingredients.
Target recipes are long-length paragraphs and do not have annotations on structure information.
We propose a novel framework of Structure-aware Generation Network (SGN) to tackle the food recipe generation task.
arXiv Detail & Related papers (2020-09-02T10:54:25Z) - ISIA Food-500: A Dataset for Large-Scale Food Recognition via Stacked
Global-Local Attention Network [50.7720194859196]
We introduce the dataset ISIA Food- 500 with 500 categories from the list in the Wikipedia and 399,726 images.
This dataset surpasses existing popular benchmark datasets by category coverage and data volume.
We propose a stacked global-local attention network, which consists of two sub-networks for food recognition.
arXiv Detail & Related papers (2020-08-13T02:48:27Z) - Cross-Modal Food Retrieval: Learning a Joint Embedding of Food Images
and Recipes with Semantic Consistency and Attention Mechanism [70.85894675131624]
We learn an embedding of images and recipes in a common feature space, such that the corresponding image-recipe embeddings lie close to one another.
We propose Semantic-Consistent and Attention-based Networks (SCAN), which regularize the embeddings of the two modalities through aligning output semantic probabilities.
We show that we can outperform several state-of-the-art cross-modal retrieval strategies for food images and cooking recipes by a significant margin.
arXiv Detail & Related papers (2020-03-09T07:41:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.