Related papers: MultiFoodhat: A potential new paradigm for intelligent food quality inspection

MultiFoodhat: A potential new paradigm for intelligent food quality inspection

URL: http://arxiv.org/abs/2510.13889v1
Date: Tue, 14 Oct 2025 03:39:03 GMT
Title: MultiFoodhat: A potential new paradigm for intelligent food quality inspection
Authors: Yue Hu, Guohang Zhuang,
Abstract summary: MultiFoodChat is a dialogue-driven multi-agent reasoning framework for zero-shot food recognition.<n>An Object Perception Token (OPT) captures fine-grained visual attributes, while an Interactive Reasoning Agent (IRA) dynamically interprets contextual cues to refine predictions.<n>Experiments on multiple public food datasets demonstrate that MultiFoodChat achieves superior recognition accuracy and interpretability compared with existing unsupervised and few-shot methods.
Score: 7.966483944010341
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Food image classification plays a vital role in intelligent food quality inspection, dietary assessment, and automated monitoring. However, most existing supervised models rely heavily on large labeled datasets and exhibit limited generalization to unseen food categories. To overcome these challenges, this study introduces MultiFoodChat, a dialogue-driven multi-agent reasoning framework for zero-shot food recognition. The framework integrates vision-language models (VLMs) and large language models (LLMs) to enable collaborative reasoning through multi-round visual-textual dialogues. An Object Perception Token (OPT) captures fine-grained visual attributes, while an Interactive Reasoning Agent (IRA) dynamically interprets contextual cues to refine predictions. This multi-agent design allows flexible and human-like understanding of complex food scenes without additional training or manual annotations. Experiments on multiple public food datasets demonstrate that MultiFoodChat achieves superior recognition accuracy and interpretability compared with existing unsupervised and few-shot methods, highlighting its potential as a new paradigm for intelligent food quality inspection and analysis.

Related papers

LLMs-based Augmentation for Domain Adaptation in Long-tailed Food Datasets [54.527878056610156]
We present a framework empowered with large language models (LLMs) to address these challenges in food recognition.<n>We first leverage LLMs to parse food images to generate food titles and ingredients.<n>Then, we project the generated texts and food images from different domains to a shared embedding space to maximize the pair similarities.
arXiv Detail & Related papers (2025-11-20T04:38:56Z)
Informatics for Food Processing [0.5266869303483376]
The chapter emphasizes the transformative role of machine learning, artificial intelligence (AI), and data science in advancing food informatics.<n>To address these issues, the chapter presents novel computational approaches, including FoodProX, a random forest model trained on nutrient composition data to infer processing levels.<n>A key contribution of the chapter is a novel case study using the Open Food Facts database, showcasing how multimodal AI models can integrate structured and unstructured data to classify foods at scale.
arXiv Detail & Related papers (2025-05-20T20:44:31Z)
A SAM based Tool for Semi-Automatic Food Annotation [0.0]
We present a demo of a semi-automatic food image annotation tool leveraging the Segment Anything Model (SAM) The tool enables prompt-based food segmentation via user interactions, promoting user engagement and allowing them to further categorise food items within meal images. We also release a fine-tuned version of SAM's mask decoder, dubbed MealSAM, with the ViT-B backbone tailored specifically for food image segmentation.
arXiv Detail & Related papers (2024-10-11T11:50:10Z)
MetaFood3D: 3D Food Dataset with Nutrition Values [52.16894900096017]
This dataset consists of 743 meticulously scanned and labeled 3D food objects across 131 categories.<n>Our MetaFood3D dataset emphasizes intra-class diversity and includes rich modalities such as textured mesh files, RGB-D videos, and segmentation masks.
arXiv Detail & Related papers (2024-09-03T15:02:52Z)
RoDE: Linear Rectified Mixture of Diverse Experts for Food Large Multi-Modal Models [96.43285670458803]
Uni-Food is a unified food dataset that comprises over 100,000 images with various food labels.<n>Uni-Food is designed to provide a more holistic approach to food data analysis.<n>We introduce a novel Linear Rectification Mixture of Diverse Experts (RoDE) approach to address the inherent challenges of food-related multitasking.
arXiv Detail & Related papers (2024-07-17T16:49:34Z)
FoodieQA: A Multimodal Dataset for Fine-Grained Understanding of Chinese Food Culture [60.51749998013166]
We introduce FoodieQA, a manually curated, fine-grained image-text dataset capturing the intricate features of food cultures across various regions in China. We evaluate vision-language Models (VLMs) and large language models (LLMs) on newly collected, unseen food images and corresponding questions. Our findings highlight that understanding food and its cultural implications remains a challenging and under-explored direction.
arXiv Detail & Related papers (2024-06-16T17:59:32Z)
FoodLMM: A Versatile Food Assistant using Large Multi-modal Model [96.76271649854542]
Large Multi-modal Models (LMMs) have made impressive progress in many vision-language tasks. This paper proposes FoodLMM, a versatile food assistant based on LMMs with various capabilities. We introduce a series of novel task-specific tokens and heads, enabling the model to predict food nutritional values and multiple segmentation masks.
arXiv Detail & Related papers (2023-12-22T11:56:22Z)
Dietary Assessment with Multimodal ChatGPT: A Systematic Analysis [17.333822848423708]
This study explores the application of multimodal ChatGPT within the realm of dietary assessment. By guiding the model with specific language prompts, GPT-4V shifts from recognizing common staples like rice and bread to accurately identifying regional dishes like banku and ugali. GPT-4V can leverage surrounding objects as scale references to deduce the portion sizes of food items, further enhancing its accuracy in translating food weight into nutritional content.
arXiv Detail & Related papers (2023-12-14T01:26:45Z)
Food Image Classification and Segmentation with Attention-based Multiple Instance Learning [51.279800092581844]
The paper presents a weakly supervised methodology for training food image classification and semantic segmentation models. The proposed methodology is based on a multiple instance learning approach in combination with an attention-based mechanism. We conduct experiments on two meta-classes within the FoodSeg103 data set to verify the feasibility of the proposed approach.
arXiv Detail & Related papers (2023-08-22T13:59:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.