Related papers: DataChef: Cooking Up Optimal Data Recipes for LLM Adaptation via Reinforcement Learning

DataChef: Cooking Up Optimal Data Recipes for LLM Adaptation via Reinforcement Learning

URL: http://arxiv.org/abs/2602.11089v1
Date: Wed, 11 Feb 2026 17:56:15 GMT
Title: DataChef: Cooking Up Optimal Data Recipes for LLM Adaptation via Reinforcement Learning
Authors: Yicheng Chen, Zerun Ma, Xinchen Xie, Yining Li, Kai Chen,
Abstract summary: We present DataChef-32B, which performs online reinforcement learning using a proxy reward that predicts downstream performance for candidate recipes.<n>The recipe from DataChef-32B adapts Qwen3-1.7B-Base to the math domain, achieving 66.7 on AIME'25 and surpassing Qwen3-1.7B.
Score: 27.75273528849027
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the current landscape of Large Language Models (LLMs), the curation of large-scale, high-quality training data is a primary driver of model performance. A key lever is the \emph{data recipe}, which comprises a data processing pipeline to transform raw sources into training corpora. Despite the growing use of LLMs to automate individual data processing steps, such as data synthesis and filtering, the overall design of data recipes remains largely manual and labor-intensive, requiring substantial human expertise and iteration. To bridge this gap, we formulate \emph{end-to-end data recipe generation} for LLM adaptation. Given a target benchmark and a pool of available data sources, a model is required to output a complete data recipe that adapts a base LLM to the target task. We present DataChef-32B, which performs online reinforcement learning using a proxy reward that predicts downstream performance for candidate recipes. Across six held-out tasks, DataChef-32B produces practical recipes that reach comparable downstream performance to those curated by human experts. Notably, the recipe from DataChef-32B adapts Qwen3-1.7B-Base to the math domain, achieving 66.7 on AIME'25 and surpassing Qwen3-1.7B. This work sheds new light on automating LLM training and developing self-evolving AI systems.

Related papers

Augmented Fine-Tuned LLMs for Enhanced Recruitment Automation [0.4349640169711269]
Large Language Models (LLMs) were fine-tuned to improve accuracy and efficiency.<n>System creates a synthetic dataset that uses a standardized format.<n>Phy-4 model achieved the highest F1 score of 90.62%, indicating exceptional precision and recall in recruitment tasks.
arXiv Detail & Related papers (2025-09-07T20:18:31Z)
Transferable text data distillation by trajectory matching [27.826518926355295]
The data distillation method aims to synthesize a small number of data samples to achieve the training effect of the full data set.<n>In this work, we proposed a method that involves learning pseudo prompt data based on trajectory matching.<n> Evaluations on two benchmarks, including ARC-Easy and MMLU instruction tuning datasets, established the superiority of our distillation approach over the SOTA data selection method LESS.
arXiv Detail & Related papers (2025-04-14T02:39:26Z)
Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration [90.41908331897639]
Large language models (LLMs) have significantly benefited from training on diverse, high-quality task-specific data. We present a novel approach, ReverseGen, designed to automatically generate effective training samples.
arXiv Detail & Related papers (2024-10-22T06:43:28Z)
DataComp-LM: In search of the next generation of training sets for language models [200.5293181577585]
DataComp for Language Models (DCLM) is a testbed for controlled dataset experiments with the goal of improving language models.<n>We provide a standardized corpus of 240T tokens extracted from Common Crawl, effective pretraining recipes based on the OpenLM framework, and a broad suite of 53 downstream evaluations.<n>Participants in the DCLM benchmark can experiment with data curation strategies such as deduplication, filtering, and data mixing at model scales ranging from 412M to 7B parameters.
arXiv Detail & Related papers (2024-06-17T17:42:57Z)
How to Train Data-Efficient LLMs [56.41105687693619]
We study data-efficient approaches for pre-training language models (LLMs) We find that Ask-LLM and Density sampling are the best methods in their respective categories. In our comparison of 19 samplers, involving hundreds of evaluation tasks and pre-training runs, we find that Ask-LLM and Density are the best methods in their respective categories.
arXiv Detail & Related papers (2024-02-15T02:27:57Z)
Data-Juicer: A One-Stop Data Processing System for Large Language Models [73.27731037450995]
A data recipe is a mixture of data from different sources for training Large Language Models (LLMs) We build a new system named Data-Juicer, with which we can efficiently generate diverse data recipes. The data recipes derived with Data-Juicer gain notable improvements on state-of-the-art LLMs.
arXiv Detail & Related papers (2023-09-05T08:22:07Z)
MLLM-DataEngine: An Iterative Refinement Approach for MLLM [62.30753425449056]
We propose a novel closed-loop system that bridges data generation, model training, and evaluation. Within each loop, the MLLM-DataEngine first analyze the weakness of the model based on the evaluation results. For targeting, we propose an Adaptive Bad-case Sampling module, which adjusts the ratio of different types of data. For quality, we resort to GPT-4 to generate high-quality data with each given data type.
arXiv Detail & Related papers (2023-08-25T01:41:04Z)
Structured Vision-Language Pretraining for Computational Cooking [54.0571416522547]
Vision-Language Pretraining and Foundation models have been the go-to recipe for achieving SoTA performance on general benchmarks. We propose to leverage these techniques for structured-text based computational cuisine tasks.
arXiv Detail & Related papers (2022-12-08T13:37:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.