Related papers: Self-Evolving Recommendation System: End-To-End Autonomous Model Optimization With LLM Agents

Self-Evolving Recommendation System: End-To-End Autonomous Model Optimization With LLM Agents

URL: http://arxiv.org/abs/2602.10226v1
Date: Tue, 10 Feb 2026 19:16:52 GMT
Title: Self-Evolving Recommendation System: End-To-End Autonomous Model Optimization With LLM Agents
Authors: Haochen Wang, Yi Wu, Daryl Chang, Li Wei, Lukasz Heldt,
Abstract summary: We propose a self-evolving system to autonomously generate, train, and deploy complex model changes.<n>Our agents act as specialized Machine Learning Engineers (MLEs)<n>The effectiveness of this approach is demonstrated through several successful production launches at YouTube.
Score: 18.707716142982992
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Optimizing large-scale machine learning systems, such as recommendation models for global video platforms, requires navigating a massive hyperparameter search space and, more critically, designing sophisticated optimizers, architectures, and reward functions to capture nuanced user behaviors. Achieving substantial improvements in these areas is a non-trivial task, traditionally relying on extensive manual iterations to test new hypotheses. We propose a self-evolving system that leverages Large Language Models (LLMs), specifically those from Google's Gemini family, to autonomously generate, train, and deploy high-performing, complex model changes within an end-to-end automated workflow. The self-evolving system is comprised of an Offline Agent (Inner Loop) that performs high-throughput hypothesis generation using proxy metrics, and an Online Agent (Outer Loop) that validates candidates against delayed north star business metrics in live production. Our agents act as specialized Machine Learning Engineers (MLEs): they exhibit deep reasoning capabilities, discovering novel improvements in optimization algorithms and model architecture, and formulating innovative reward functions that target long-term user engagement. The effectiveness of this approach is demonstrated through several successful production launches at YouTube, confirming that autonomous, LLM-driven evolution can surpass traditional engineering workflows in both development velocity and model performance.

Related papers

EmboCoach-Bench: Benchmarking AI Agents on Developing Embodied Robots [68.29056647487519]
Embodied AI is fueled by high-fidelity simulation and large-scale data collection.<n>However, this scaling capability remains bottlenecked by a reliance on labor-intensive manual oversight.<n>We introduce textscEmboCoach-Bench, a benchmark evaluating the capacity of LLM agents to autonomously engineer embodied policies.
arXiv Detail & Related papers (2026-01-29T11:33:49Z)
From Agentification to Self-Evolving Agentic AI for Wireless Networks: Concepts, Approaches, and Future Research Directions [70.72279728350763]
Self-evolving agentic artificial intelligence (AI) offers a new paradigm for future wireless systems.<n>Unlike static AI models, self-evolving agents embed an autonomous evolution cycle that updates models, tools, and in response to environmental dynamics.<n>This paper presents a comprehensive overview of self-evolving agentic AI, highlighting its layered architecture, life cycle, and key techniques.
arXiv Detail & Related papers (2025-10-07T05:45:25Z)
Automating Data-Driven Modeling and Analysis for Engineering Applications using Large Language Model Agents [3.344730946122235]
We propose an innovative pipeline utilizing Large Language Model (LLM) agents to automate data-driven modeling and analysis.<n>We evaluate two LLM-agent frameworks: a multi-agent system featuring specialized collaborative agents, and a single-agent system based on the Reasoning and Acting (ReAct) paradigm.
arXiv Detail & Related papers (2025-10-01T19:28:35Z)
SEA: Self-Evolution Agent with Step-wise Reward for Computer Use [6.056153018209402]
We propose the Self-Evolution Agent (SEA) for computer use, and to develop this agent, we propose creative methods in data generation, reinforcement learning, and model enhancement.<n>Based on our proposed innovation of data generation, training strategy, and enhancement, we get the Selfevolution Agent (SEA) for computer use with only 7B parameters.
arXiv Detail & Related papers (2025-08-06T02:57:22Z)
CALM: Co-evolution of Algorithms and Language Model for Automatic Heuristic Design [11.639825726501659]
Large language models (LLMs) can autonomously discover high-performings at a fraction of the traditional cost.<n>We propose a hybrid framework that combines verbal and numerical guidance.<n>Our method outperforms state-of-the-art (SOTA) baselines across various optimization tasks.
arXiv Detail & Related papers (2025-05-18T07:48:47Z)
WebEvolver: Enhancing Web Agent Self-Improvement with Coevolving World Model [55.276852838877346]
Self-evolving agents are trained on trajectories sampled autonomously based on their own policies.<n>We propose a novel framework that introduces a co-evolving World Model LLM.<n>This world model predicts the next observation based on the current observation and action within the web environment.
arXiv Detail & Related papers (2025-04-23T02:54:31Z)
ToolACE-R: Model-aware Iterative Training and Adaptive Refinement for Tool Learning [84.69651852838794]
Tool learning allows Large Language Models (LLMs) to leverage external tools for solving complex user tasks.<n>We propose ToolACE-R, a novel framework that includes both model-aware iterative training and adaptive refinement for tool learning.<n>We conduct extensive experiments across several benchmark datasets, showing that ToolACE-R achieves competitive performance compared to advanced API-based models.
arXiv Detail & Related papers (2025-04-02T06:38:56Z)
Improving Retrospective Language Agents via Joint Policy Gradient Optimization [57.35348425288859]
RetroAct is a framework that jointly optimize both task-planning and self-reflective evolution capabilities in language agents.<n>We develop a two-stage joint optimization process that integrates imitation learning and reinforcement learning.<n>We conduct extensive experiments across various testing environments, demonstrating RetroAct has substantial improvements in task performance and decision-making processes.
arXiv Detail & Related papers (2025-03-03T12:54:54Z)
On the Modeling Capabilities of Large Language Models for Sequential Decision Making [52.128546842746246]
Large pretrained models are showing increasingly better performance in reasoning and planning tasks. We evaluate their ability to produce decision-making policies, either directly, by generating actions, or indirectly. In environments with unfamiliar dynamics, we explore how fine-tuning LLMs with synthetic data can significantly improve their reward modeling capabilities.
arXiv Detail & Related papers (2024-10-08T03:12:57Z)
ORLM: A Customizable Framework in Training Large Models for Automated Optimization Modeling [15.67321902882617]
We propose a viable path for training open-source LLMs capable of optimization modeling and developing solver codes.<n>This work also introduces IndustryOR, the first industrial benchmark for evaluating LLMs in solving practical OR problems.
arXiv Detail & Related papers (2024-05-28T01:55:35Z)
TrainerAgent: Customizable and Efficient Model Training through LLM-Powered Multi-Agent System [14.019244136838017]
TrainerAgent is a multi-agent framework including Task, Data, Model and Server agents. These agents analyze user-defined tasks, input data, and requirements (e.g., accuracy, speed), optimizing them from both data and model perspectives to obtain satisfactory models, and finally deploy these models as online service. This research presents a significant advancement in achieving desired models with increased efficiency and quality as compared to traditional model development.
arXiv Detail & Related papers (2023-11-11T17:39:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.