Food Ingredients Recognition through Multi-label Learning
- URL: http://arxiv.org/abs/2210.14147v1
- Date: Mon, 24 Oct 2022 10:18:26 GMT
- Title: Food Ingredients Recognition through Multi-label Learning
- Authors: Rameez Ismail, Zhaorui Yuan
- Abstract summary: The ability to recognize various food-items in a generic food plate is a key determinant for an automated diet assessment system.
We employ a deep multi-label learning approach and evaluate several state-of-the-art neural networks for their ability to detect an arbitrary number of ingredients in a dish image.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The ability to recognize various food-items in a generic food plate is a key
determinant for an automated diet assessment system. This study motivates the
need for automated diet assessment and proposes a framework to achieve this.
Within this framework, we focus on one of the core functionalities to visually
recognize various ingredients. To this end, we employed a deep multi-label
learning approach and evaluated several state-of-the-art neural networks for
their ability to detect an arbitrary number of ingredients in a dish image. The
models evaluated in this work follow a definite meta-structure, consisting of
an encoder and a decoder component. Two distinct decoding schemes, one based on
global average pooling and the other on attention mechanism, are evaluated and
benchmarked. Whereas for encoding, several well-known architectures, including
DenseNet, EfficientNet, MobileNet, Inception and Xception, were employed. We
present promising preliminary results for deep learning-based ingredients
detection, using a challenging dataset, Nutrition5K, and establish a strong
baseline for future explorations.
Related papers
- Recognizing Multiple Ingredients in Food Images Using a
Single-Ingredient Classification Model [4.409722014494348]
This study introduces an advanced approach for recognizing ingredients segmented from food images.
The method localizes the candidate regions of the ingredients using the locating and sliding window techniques.
A novel model pruning method is proposed that enhances the efficiency of the classification model.
arXiv Detail & Related papers (2024-01-26T00:46:56Z) - FIRE: Food Image to REcipe generation [10.45344523054623]
Food computing aims to develop end-to-end intelligent systems capable of autonomously producing recipe information for a food image.
This paper proposes FIRE, a novel methodology tailored to recipe generation in the food computing domain.
We showcase two practical applications that can benefit from integrating FIRE with large language model prompting.
arXiv Detail & Related papers (2023-08-28T08:14:20Z) - Food Image Classification and Segmentation with Attention-based Multiple
Instance Learning [51.279800092581844]
The paper presents a weakly supervised methodology for training food image classification and semantic segmentation models.
The proposed methodology is based on a multiple instance learning approach in combination with an attention-based mechanism.
We conduct experiments on two meta-classes within the FoodSeg103 data set to verify the feasibility of the proposed approach.
arXiv Detail & Related papers (2023-08-22T13:59:47Z) - Learning Structural Representations for Recipe Generation and Food
Retrieval [101.97397967958722]
We propose a novel framework of Structure-aware Generation Network (SGN) to tackle the food recipe generation task.
Our proposed model can produce high-quality and coherent recipes, and achieve the state-of-the-art performance on the benchmark Recipe1M dataset.
arXiv Detail & Related papers (2021-10-04T06:36:31Z) - Joint Learning of Neural Transfer and Architecture Adaptation for Image
Recognition [77.95361323613147]
Current state-of-the-art visual recognition systems rely on pretraining a neural network on a large-scale dataset and finetuning the network weights on a smaller dataset.
In this work, we prove that dynamically adapting network architectures tailored for each domain task along with weight finetuning benefits in both efficiency and effectiveness.
Our method can be easily generalized to an unsupervised paradigm by replacing supernet training with self-supervised learning in the source domain tasks and performing linear evaluation in the downstream tasks.
arXiv Detail & Related papers (2021-03-31T08:15:17Z) - Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers
and Self-supervised Learning [17.42688184238741]
Cross-modal recipe retrieval has recently gained substantial attention due to the importance of food in people's lives.
We propose a simplified end-to-end model based on well established and high performing encoders for text and images.
Our proposed method achieves state-of-the-art performance in the cross-modal recipe retrieval task on the Recipe1M dataset.
arXiv Detail & Related papers (2021-03-24T10:17:09Z) - Visual Aware Hierarchy Based Food Recognition [10.194167945992938]
We propose a new two-step food recognition system using Convolutional Neural Networks (CNNs) as the backbone architecture.
The food localization step is based on an implementation of the Faster R-CNN method to identify food regions.
In the food classification step, visually similar food categories can be clustered together automatically to generate a hierarchical structure.
arXiv Detail & Related papers (2020-12-06T20:25:31Z) - Structure-Aware Generation Network for Recipe Generation from Images [142.047662926209]
We investigate an open research task of generating cooking instructions based on only food images and ingredients.
Target recipes are long-length paragraphs and do not have annotations on structure information.
We propose a novel framework of Structure-aware Generation Network (SGN) to tackle the food recipe generation task.
arXiv Detail & Related papers (2020-09-02T10:54:25Z) - ISIA Food-500: A Dataset for Large-Scale Food Recognition via Stacked
Global-Local Attention Network [50.7720194859196]
We introduce the dataset ISIA Food- 500 with 500 categories from the list in the Wikipedia and 399,726 images.
This dataset surpasses existing popular benchmark datasets by category coverage and data volume.
We propose a stacked global-local attention network, which consists of two sub-networks for food recognition.
arXiv Detail & Related papers (2020-08-13T02:48:27Z) - Deep Multimodal Neural Architecture Search [178.35131768344246]
We devise a generalized deep multimodal neural architecture search (MMnas) framework for various multimodal learning tasks.
Given multimodal input, we first define a set of primitive operations, and then construct a deep encoder-decoder based unified backbone.
On top of the unified backbone, we attach task-specific heads to tackle different multimodal learning tasks.
arXiv Detail & Related papers (2020-04-25T07:00:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.