LLaVA-Chef: A Multi-modal Generative Model for Food Recipes
- URL: http://arxiv.org/abs/2408.16889v1
- Date: Thu, 29 Aug 2024 20:20:49 GMT
- Title: LLaVA-Chef: A Multi-modal Generative Model for Food Recipes
- Authors: Fnu Mohbat, Mohammed J. Zaki,
- Abstract summary: Large language models (LLMs) have paved the way for Natural Language Processing approaches to delve deeper into food-related tasks.
This work proposes LLaVA-Chef, a novel model trained on a curated dataset of diverse recipe prompts.
A detailed qualitative analysis reveals that LLaVA-Chef generates more detailed recipes with precise ingredient mentions.
- Score: 17.705244174235045
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the rapidly evolving landscape of online recipe sharing within a globalized context, there has been a notable surge in research towards comprehending and generating food recipes. Recent advancements in large language models (LLMs) like GPT-2 and LLaVA have paved the way for Natural Language Processing (NLP) approaches to delve deeper into various facets of food-related tasks, encompassing ingredient recognition and comprehensive recipe generation. Despite impressive performance and multi-modal adaptability of LLMs, domain-specific training remains paramount for their effective application. This work evaluates existing LLMs for recipe generation and proposes LLaVA-Chef, a novel model trained on a curated dataset of diverse recipe prompts in a multi-stage approach. First, we refine the mapping of visual food image embeddings to the language space. Second, we adapt LLaVA to the food domain by fine-tuning it on relevant recipe data. Third, we utilize diverse prompts to enhance the model's recipe comprehension. Finally, we improve the linguistic quality of generated recipes by penalizing the model with a custom loss function. LLaVA-Chef demonstrates impressive improvements over pretrained LLMs and prior works. A detailed qualitative analysis reveals that LLaVA-Chef generates more detailed recipes with precise ingredient mentions, compared to existing approaches.
Related papers
- Culinary Class Wars: Evaluating LLMs using ASH in Cuisine Transfer Task [18.229223484919775]
Large Language Models (LLMs) have shown promise in various creative domains, including culinary arts.
This study focuses on cuisine transfer-applying elements of one cuisine to another to assess LLMs' culinary creativity.
arXiv Detail & Related papers (2024-11-04T11:31:18Z) - RoDE: Linear Rectified Mixture of Diverse Experts for Food Large Multi-Modal Models [96.43285670458803]
Uni-Food is a unified food dataset that comprises over 100,000 images with various food labels.
Uni-Food is designed to provide a more holistic approach to food data analysis.
We introduce a novel Linear Rectification Mixture of Diverse Experts (RoDE) approach to address the inherent challenges of food-related multitasking.
arXiv Detail & Related papers (2024-07-17T16:49:34Z) - SoupLM: Model Integration in Large Language and Multi-Modal Models [51.12227693121004]
Training large language models (LLMs) requires significant computing resources.
Existing publicly available LLMs are typically pre-trained on diverse, privately curated datasets spanning various tasks.
arXiv Detail & Related papers (2024-07-11T05:38:15Z) - FoodLMM: A Versatile Food Assistant using Large Multi-modal Model [96.76271649854542]
Large Multi-modal Models (LMMs) have made impressive progress in many vision-language tasks.
This paper proposes FoodLMM, a versatile food assistant based on LMMs with various capabilities.
We introduce a series of novel task-specific tokens and heads, enabling the model to predict food nutritional values and multiple segmentation masks.
arXiv Detail & Related papers (2023-12-22T11:56:22Z) - FIRE: Food Image to REcipe generation [10.45344523054623]
Food computing aims to develop end-to-end intelligent systems capable of autonomously producing recipe information for a food image.
This paper proposes FIRE, a novel methodology tailored to recipe generation in the food computing domain.
We showcase two practical applications that can benefit from integrating FIRE with large language model prompting.
arXiv Detail & Related papers (2023-08-28T08:14:20Z) - Large Language Models as Sous Chefs: Revising Recipes with GPT-3 [56.7155146252028]
We focus on recipes as an example of complex, diverse, and widely used instructions.
We develop a prompt grounded in the original recipe and ingredients list that breaks recipes down into simpler steps.
We also contribute an Amazon Mechanical Turk task that is carefully designed to reduce fatigue while collecting human judgment of the quality of recipe revisions.
arXiv Detail & Related papers (2023-06-24T14:42:43Z) - KitchenScale: Learning to predict ingredient quantities from recipe
contexts [13.001618172288198]
KitchenScale is a model that predicts a target ingredient's quantity and measurement unit given its recipe context.
We formulate an ingredient quantity prediction task that consists of three sub-tasks which are ingredient measurement type classification, unit classification, and quantity regression task.
Experiments with our newly constructed dataset and recommendation examples demonstrate KitchenScale's understanding of various recipe contexts.
arXiv Detail & Related papers (2023-04-21T04:28:16Z) - Counterfactual Recipe Generation: Exploring Compositional Generalization
in a Realistic Scenario [60.20197771545983]
We design the counterfactual recipe generation task, which asks models to modify a base recipe according to the change of an ingredient.
We collect a large-scale recipe dataset in Chinese for models to learn culinary knowledge.
Results show that existing models have difficulties in modifying the ingredients while preserving the original text style, and often miss actions that need to be adjusted.
arXiv Detail & Related papers (2022-10-20T17:21:46Z) - Multi-modal Cooking Workflow Construction for Food Recipes [147.4435186953995]
We build MM-ReS, the first large-scale dataset for cooking workflow construction.
We propose a neural encoder-decoder model that utilizes both visual and textual information to construct the cooking workflow.
arXiv Detail & Related papers (2020-08-20T18:31:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.