Optimal Meal Schedule for a Local Nonprofit Using LLM-Aided Data Extraction
- URL: http://arxiv.org/abs/2511.18483v1
- Date: Sun, 23 Nov 2025 15:05:21 GMT
- Title: Optimal Meal Schedule for a Local Nonprofit Using LLM-Aided Data Extraction
- Authors: Sergio Marin, Nhu Nguyen, Max, Zheng, Christina M. Weaver,
- Abstract summary: We present a data-driven pipeline developed in collaboration with the Power Packs Project, a nonprofit addressing food insecurity in local communities.<n>The system integrates data extraction from PDFs, large language models for ingredient standardization, and binary integer programming to generate a 15-week recipe schedule.<n>All 157 recipes were mapped to a nutritional database and assigned estimated and predicted costs using historical invoice data and category-specific inflation adjustments.
- Score: 3.9271139410453366
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We present a data-driven pipeline developed in collaboration with the Power Packs Project, a nonprofit addressing food insecurity in local communities. The system integrates data extraction from PDFs, large language models for ingredient standardization, and binary integer programming to generate a 15-week recipe schedule that minimizes projected wholesale costs while meeting nutritional constraints. All 157 recipes were mapped to a nutritional database and assigned estimated and predicted costs using historical invoice data and category-specific inflation adjustments. The model effectively handles real-world price volatility and is structured for easy updates as new recipes or cost data become available. Optimization results show that constraint-based selection yields nutritionally balanced and cost-efficient plans under uncertainty. To facilitate real-time decision-making, we deployed a searchable web platform that integrates analytical models into daily operations by enabling staff to explore recipes by ingredient, category, or through an optimized meal plan.
Related papers
- Can Small Training Runs Reliably Guide Data Curation? Rethinking Proxy-Model Practice [109.9635246405237]
We show that the experiment conclusions about data quality can flip with even minor adjustments to training hyper parameters.<n>We introduce a simple patch to the evaluation protocol: using reduced learning rates for proxy model training.<n> Empirically, we validate this approach across 23 data recipes covering four critical dimensions of data curation.
arXiv Detail & Related papers (2025-12-30T23:02:44Z) - KERL: Knowledge-Enhanced Personalized Recipe Recommendation using Large Language Models [17.705244174235045]
Recent advances in large language models (LLMs) and the abundance of food data have resulted in studies to improve food understanding.<n>We introduce KERL, a unified system that leverages food KGs and LLMs to provide personalized food recommendations.<n>We show that our proposed KG-augmented LLM significantly outperforms existing approaches.
arXiv Detail & Related papers (2025-05-20T17:19:57Z) - Data Mixture Optimization: A Multi-fidelity Multi-scale Bayesian Framework [10.317740844867913]
We build a simulator based on 472 language model pre-training runs with varying data compositions from the SlimPajama dataset.<n>We observe that even simple acquisition functions can enable principled training decisions across training models from 20M to 1B kernels.
arXiv Detail & Related papers (2025-03-26T22:19:47Z) - RoDE: Linear Rectified Mixture of Diverse Experts for Food Large Multi-Modal Models [96.43285670458803]
Uni-Food is a unified food dataset that comprises over 100,000 images with various food labels.<n>Uni-Food is designed to provide a more holistic approach to food data analysis.<n>We introduce a novel Linear Rectification Mixture of Diverse Experts (RoDE) approach to address the inherent challenges of food-related multitasking.
arXiv Detail & Related papers (2024-07-17T16:49:34Z) - Continual Learning on a Diet: Learning from Sparsely Labeled Streams Under Constrained Computation [123.4883806344334]
We study a realistic Continual Learning setting where learning algorithms are granted a restricted computational budget per time step while training.
We apply this setting to large-scale semi-supervised Continual Learning scenarios with sparse label rates.
Our extensive analysis and ablations demonstrate that DietCL is stable under a full spectrum of label sparsity, computational budget, and various other ablations.
arXiv Detail & Related papers (2024-04-19T10:10:39Z) - Data-Juicer: A One-Stop Data Processing System for Large Language Models [73.27731037450995]
A data recipe is a mixture of data from different sources for training Large Language Models (LLMs)
We build a new system named Data-Juicer, with which we can efficiently generate diverse data recipes.
The data recipes derived with Data-Juicer gain notable improvements on state-of-the-art LLMs.
arXiv Detail & Related papers (2023-09-05T08:22:07Z) - Learning to Substitute Ingredients in Recipes [15.552549060863523]
Recipe personalization through ingredient substitution has the potential to help people meet their dietary needs and preferences, avoid potential allergens, and ease culinary exploration in everyone's kitchen.
We build a benchmark, composed of a dataset of substitution pairs with standardized splits, evaluation metrics, and baselines.
We introduce Graph-based Ingredient Substitution Module (GISMo), a novel model that leverages the context of a recipe as well as generic ingredient relational information encoded within a graph to rank plausible substitutions.
We show through comprehensive experimental validation that GISMo surpasses the best performing baseline by a large margin in terms of mean reciprocal rank.
arXiv Detail & Related papers (2023-02-15T21:49:23Z) - Structured Vision-Language Pretraining for Computational Cooking [54.0571416522547]
Vision-Language Pretraining and Foundation models have been the go-to recipe for achieving SoTA performance on general benchmarks.
We propose to leverage these techniques for structured-text based computational cuisine tasks.
arXiv Detail & Related papers (2022-12-08T13:37:17Z) - Optimizing Data Collection for Machine Learning [87.37252958806856]
Modern deep learning systems require huge data sets to achieve impressive performance.
Over-collecting data incurs unnecessary present costs, while under-collecting may incur future costs and delay.
We propose a new paradigm for modeling the data collection as a formal optimal data collection problem.
arXiv Detail & Related papers (2022-10-03T21:19:05Z) - An Open-Source Dataset on Dietary Behaviors and DASH Eating Plan
Optimization Constraints [0.29298205115761694]
We provide a modified dataset based on dietary behaviors of different groups of people, their demographics, and pre-existing conditions.
We additionally provide tailored datasets for hypertension and pre-diabetic patients as groups of interest who may benefit from targetted diets.
arXiv Detail & Related papers (2020-10-15T05:25:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.