Food Image Classification and Segmentation with Attention-based Multiple
Instance Learning
- URL: http://arxiv.org/abs/2308.11452v1
- Date: Tue, 22 Aug 2023 13:59:47 GMT
- Title: Food Image Classification and Segmentation with Attention-based Multiple
Instance Learning
- Authors: Valasia Vlachopoulou, Ioannis Sarafis, Alexandros Papadopoulos
- Abstract summary: The paper presents a weakly supervised methodology for training food image classification and semantic segmentation models.
The proposed methodology is based on a multiple instance learning approach in combination with an attention-based mechanism.
We conduct experiments on two meta-classes within the FoodSeg103 data set to verify the feasibility of the proposed approach.
- Score: 51.279800092581844
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The demand for accurate food quantification has increased in the recent
years, driven by the needs of applications in dietary monitoring. At the same
time, computer vision approaches have exhibited great potential in automating
tasks within the food domain. Traditionally, the development of machine
learning models for these problems relies on training data sets with
pixel-level class annotations. However, this approach introduces challenges
arising from data collection and ground truth generation that quickly become
costly and error-prone since they must be performed in multiple settings and
for thousands of classes. To overcome these challenges, the paper presents a
weakly supervised methodology for training food image classification and
semantic segmentation models without relying on pixel-level annotations. The
proposed methodology is based on a multiple instance learning approach in
combination with an attention-based mechanism. At test time, the models are
used for classification and, concurrently, the attention mechanism generates
semantic heat maps which are used for food class segmentation. In the paper, we
conduct experiments on two meta-classes within the FoodSeg103 data set to
verify the feasibility of the proposed approach and we explore the functioning
properties of the attention mechanism.
Related papers
- Granularity Matters in Long-Tail Learning [62.30734737735273]
We offer a novel perspective on long-tail learning, inspired by an observation: datasets with finer granularity tend to be less affected by data imbalance.
We introduce open-set auxiliary classes that are visually similar to existing ones, aiming to enhance representation learning for both head and tail classes.
To prevent the overwhelming presence of auxiliary classes from disrupting training, we introduce a neighbor-silencing loss.
arXiv Detail & Related papers (2024-10-21T13:06:21Z) - Reinforcing Pre-trained Models Using Counterfactual Images [54.26310919385808]
This paper proposes a novel framework to reinforce classification models using language-guided generated counterfactual images.
We identify model weaknesses by testing the model using the counterfactual image dataset.
We employ the counterfactual images as an augmented dataset to fine-tune and reinforce the classification model.
arXiv Detail & Related papers (2024-06-19T08:07:14Z) - Mitigating Bias: Enhancing Image Classification by Improving Model
Explanations [9.791305104409057]
Deep learning models tend to rely heavily on simple and easily discernible features in the background of images.
We introduce a mechanism that encourages the model to allocate sufficient attention to the foreground.
Our findings highlight the importance of foreground attention in enhancing model understanding and representation of the main concepts within images.
arXiv Detail & Related papers (2023-07-04T04:46:44Z) - Transferring Knowledge for Food Image Segmentation using Transformers
and Convolutions [65.50975507723827]
Food image segmentation is an important task that has ubiquitous applications, such as estimating the nutritional value of a plate of food.
One challenge is that food items can overlap and mix, making them difficult to distinguish.
Two models are trained and compared, one based on convolutional neural networks and the other on Bidirectional representation for Image Transformers (BEiT)
The BEiT model outperforms the previous state-of-the-art model by achieving a mean intersection over union of 49.4 on FoodSeg103.
arXiv Detail & Related papers (2023-06-15T15:38:10Z) - Online Class-Incremental Learning For Real-World Food Image
Classification [8.438092346233054]
Real-world food consumption patterns, shaped by cultural, economic, and personal influences, involve dynamic and evolving data.
Online Class Incremental Learning (OCIL) addresses the challenge of learning continuously from a single-pass data stream.
We present an attachable Dynamic Model Update (DMU) module designed for existing ER methods, which enables the selection of relevant images for model training.
arXiv Detail & Related papers (2023-01-12T19:00:27Z) - Generalization Properties of Retrieval-based Models [50.35325326050263]
Retrieval-based machine learning methods have enjoyed success on a wide range of problems.
Despite growing literature showcasing the promise of these models, the theoretical underpinning for such models remains underexplored.
We present a formal treatment of retrieval-based models to characterize their generalization ability.
arXiv Detail & Related papers (2022-10-06T00:33:01Z) - Simulating Personal Food Consumption Patterns using a Modified Markov
Chain [5.874935571318868]
We propose a novel framework to simulate personal food consumption data patterns, leveraging the use of a modified Markov chain model and self-supervised learning.
Our experimental results demonstrate promising performance compared with random simulation and the original Markov chain method.
arXiv Detail & Related papers (2022-08-13T18:50:23Z) - Semantic Representation and Dependency Learning for Multi-Label Image
Recognition [76.52120002993728]
We propose a novel and effective semantic representation and dependency learning (SRDL) framework to learn category-specific semantic representation for each category.
Specifically, we design a category-specific attentional regions (CAR) module to generate channel/spatial-wise attention matrices to guide model.
We also design an object erasing (OE) module to implicitly learn semantic dependency among categories by erasing semantic-aware regions.
arXiv Detail & Related papers (2022-04-08T00:55:15Z) - Move-to-Data: A new Continual Learning approach with Deep CNNs,
Application for image-class recognition [0.0]
It is necessary to pre-train the model at a "training recording phase" and then adjust it to the new coming data.
We propose a fast continual learning layer at the end of the neuronal network.
arXiv Detail & Related papers (2020-06-12T13:04:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.