Related papers: Feature-Enhanced TResNet for Fine-Grained Food Image Classification

Feature-Enhanced TResNet for Fine-Grained Food Image Classification

URL: http://arxiv.org/abs/2507.12828v2
Date: Wed, 23 Jul 2025 02:14:58 GMT
Title: Feature-Enhanced TResNet for Fine-Grained Food Image Classification
Authors: Lulu Liu, Zhiyong Xiao,
Abstract summary: We propose Feature-Enhanced TResNet (FE-TResNet), a novel deep learning model designed to improve the accuracy of food image recognition in fine-grained scenarios.<n>FE-TResNet integrates a Style-based Recalibration Module (StyleRM) and Deep Channel-wise Attention (DCA) to enhance feature extraction and emphasize subtle distinctions between food items.
Score: 1.3351610617039973
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Food is not only essential to human health but also serves as a medium for cultural identity and emotional connection. In the context of precision nutrition, accurately identifying and classifying food images is critical for dietary monitoring, nutrient estimation, and personalized health management. However, fine-grained food classification remains challenging due to the subtle visual differences among similar dishes. To address this, we propose Feature-Enhanced TResNet (FE-TResNet), a novel deep learning model designed to improve the accuracy of food image recognition in fine-grained scenarios. Built on the TResNet architecture, FE-TResNet integrates a Style-based Recalibration Module (StyleRM) and Deep Channel-wise Attention (DCA) to enhance feature extraction and emphasize subtle distinctions between food items. Evaluated on two benchmark Chinese food datasets-ChineseFoodNet and CNFOOD-241-FE-TResNet achieved high classification accuracies of 81.37% and 80.29%, respectively. These results demonstrate its effectiveness and highlight its potential as a key enabler for intelligent dietary assessment and personalized recommendations in precision nutrition systems.

Related papers

Advancing Food Nutrition Estimation via Visual-Ingredient Feature Fusion [69.84988999191343]
We introduce FastFood, a dataset with 84,446 images across 908 fast food categories, featuring ingredient and nutritional annotations.<n>We propose a new model-agnostic Visual-Ingredient Feature Fusion (VIF$2$) method to enhance nutrition estimation.
arXiv Detail & Related papers (2025-05-13T17:01:21Z)
MetaFood3D: 3D Food Dataset with Nutrition Values [52.16894900096017]
This dataset consists of 743 meticulously scanned and labeled 3D food objects across 131 categories.<n>Our MetaFood3D dataset emphasizes intra-class diversity and includes rich modalities such as textured mesh files, RGB-D videos, and segmentation masks.
arXiv Detail & Related papers (2024-09-03T15:02:52Z)
NutritionVerse-Direct: Exploring Deep Neural Networks for Multitask Nutrition Prediction from Food Images [63.314702537010355]
Self-reporting methods are often inaccurate and suffer from substantial bias. Recent work has explored using computer vision prediction systems to predict nutritional information from food images. This paper aims to enhance the efficacy of dietary intake estimation by leveraging various neural network architectures.
arXiv Detail & Related papers (2024-05-13T14:56:55Z)
Recognizing Multiple Ingredients in Food Images Using a Single-Ingredient Classification Model [4.409722014494348]
This study introduces an advanced approach for recognizing ingredients segmented from food images. The method localizes the candidate regions of the ingredients using the locating and sliding window techniques. A novel model pruning method is proposed that enhances the efficiency of the classification model.
arXiv Detail & Related papers (2024-01-26T00:46:56Z)
NutritionVerse: Empirical Study of Various Dietary Intake Estimation Approaches [59.38343165508926]
Accurate dietary intake estimation is critical for informing policies and programs to support healthy eating. Recent work has focused on using computer vision and machine learning to automatically estimate dietary intake from food images. We introduce NutritionVerse- Synth, the first large-scale dataset of 84,984 synthetic 2D food images with associated dietary information. We also collect a real image dataset, NutritionVerse-Real, containing 889 images of 251 dishes to evaluate realism.
arXiv Detail & Related papers (2023-09-14T13:29:41Z)
Diffusion Model with Clustering-based Conditioning for Food Image Generation [22.154182296023404]
Deep learning-based techniques are commonly used to perform image analysis such as food classification, segmentation, and portion size estimation. One potential solution is to use synthetic food images for data augmentation. In this paper, we propose an effective clustering-based training framework, named ClusDiff, for generating high-quality and representative food images.
arXiv Detail & Related papers (2023-09-01T01:40:39Z)
Transferring Knowledge for Food Image Segmentation using Transformers and Convolutions [65.50975507723827]
Food image segmentation is an important task that has ubiquitous applications, such as estimating the nutritional value of a plate of food. One challenge is that food items can overlap and mix, making them difficult to distinguish. Two models are trained and compared, one based on convolutional neural networks and the other on Bidirectional representation for Image Transformers (BEiT) The BEiT model outperforms the previous state-of-the-art model by achieving a mean intersection over union of 49.4 on FoodSeg103.
arXiv Detail & Related papers (2023-06-15T15:38:10Z)
NutritionVerse-Thin: An Optimized Strategy for Enabling Improved Rendering of 3D Thin Food Models [66.77685168785152]
We present an optimized strategy for enabling improved rendering of thin 3D food models. Our method generates the 3D model mesh via a proposed thin-object-optimized differentiable reconstruction method. While simple, we find that this technique can be employed for quick and highly consistent capturing of thin 3D objects.
arXiv Detail & Related papers (2023-04-12T05:34:32Z)
Towards the Creation of a Nutrition and Food Group Based Image Database [58.429385707376554]
We propose a framework to create a nutrition and food group based image database. We design a protocol for linking food group based food codes in the U.S. Department of Agriculture's (USDA) Food and Nutrient Database for Dietary Studies (FNDDS) Our proposed method is used to build a nutrition and food group based image database including 16,114 food datasets.
arXiv Detail & Related papers (2022-06-05T02:41:44Z)
A Mobile Food Recognition System for Dietary Assessment [6.982738885923204]
We focus on developing a mobile friendly, Middle Eastern cuisine focused food recognition application for assisted living purposes. Using Mobilenet-v2 architecture for this task is beneficial in terms of both accuracy and the memory usage. The developed mobile application has potential to serve the visually impaired in automatic food recognition via images.
arXiv Detail & Related papers (2022-04-20T12:49:36Z)
Improving Dietary Assessment Via Integrated Hierarchy Food Classification [7.398060062678395]
We introduce a new food classification framework to improve the quality of predictions by integrating the information from multiple domains. Our method is validated on the modified VIPER-FoodNet (VFN) food image dataset by including associated energy and nutrient information.
arXiv Detail & Related papers (2021-09-06T20:59:58Z)
Towards Building a Food Knowledge Graph for Internet of Food [66.57235827087092]
We review the evolution of food knowledge organization, from food classification to food to food knowledge graphs. Food knowledge graphs play an important role in food search and Question Answering (QA), personalized dietary recommendation, food analysis and visualization. Future directions for food knowledge graphs cover several fields such as multimodal food knowledge graphs and food intelligence.
arXiv Detail & Related papers (2021-07-13T06:26:53Z)
Visual Aware Hierarchy Based Food Recognition [10.194167945992938]
We propose a new two-step food recognition system using Convolutional Neural Networks (CNNs) as the backbone architecture. The food localization step is based on an implementation of the Faster R-CNN method to identify food regions. In the food classification step, visually similar food categories can be clustered together automatically to generate a hierarchical structure.
arXiv Detail & Related papers (2020-12-06T20:25:31Z)
MyFood: A Food Segmentation and Classification System to Aid Nutritional Monitoring [1.5469452301122173]
The absence of food monitoring has contributed significantly to the increase in the population's weight. Some solutions have been proposed in computer vision to recognize food images, but few are specialized in nutritional monitoring. This work presents the development of an intelligent system that classifies and segments food presented in images to help the automatic monitoring of user diet and nutritional intake.
arXiv Detail & Related papers (2020-12-05T17:40:05Z)
Cross-Modal Food Retrieval: Learning a Joint Embedding of Food Images and Recipes with Semantic Consistency and Attention Mechanism [70.85894675131624]
We learn an embedding of images and recipes in a common feature space, such that the corresponding image-recipe embeddings lie close to one another. We propose Semantic-Consistent and Attention-based Networks (SCAN), which regularize the embeddings of the two modalities through aligning output semantic probabilities. We show that we can outperform several state-of-the-art cross-modal retrieval strategies for food images and cooking recipes by a significant margin.
arXiv Detail & Related papers (2020-03-09T07:41:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.