Related papers: Towards Reliable ML Feature Engineering via Planning in Constrained-Topology of LLM Agents

Towards Reliable ML Feature Engineering via Planning in Constrained-Topology of LLM Agents

URL: http://arxiv.org/abs/2601.10820v1
Date: Thu, 15 Jan 2026 19:33:42 GMT
Title: Towards Reliable ML Feature Engineering via Planning in Constrained-Topology of LLM Agents
Authors: Himanshu Thakur, Anusha Kamath, Anurag Muthyala, Dhwani Sanmukhani, Smruthi Mukund, Jay Katukuri,
Abstract summary: Recent advances in code generation models have unlocked unprecedented opportunities for automating feature engineering.<n>Their adoption in real-world ML teams remains constrained by critical challenges.<n>We address these challenges with a planner-guided, constrained-topology multi-agent framework.
Score: 1.991571265620589
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in code generation models have unlocked unprecedented opportunities for automating feature engineering, yet their adoption in real-world ML teams remains constrained by critical challenges: (i) the scarcity of datasets capturing the iterative and complex coding processes of production-level feature engineering, (ii) limited integration and personalization of widely used coding agents, such as CoPilot and Devin, with a team's unique tools, codebases, workflows, and practices, and (iii) suboptimal human-AI collaboration due to poorly timed or insufficient feedback. We address these challenges with a planner-guided, constrained-topology multi-agent framework that generates code for repositories in a multi-step fashion. The LLM-powered planner leverages a team's environment, represented as a graph, to orchestrate calls to available agents, generate context-aware prompts, and use downstream failures to retroactively correct upstream artifacts. It can request human intervention at critical steps, ensuring generated code is reliable, maintainable, and aligned with team expectations. On a novel in-house dataset, our approach achieves 38% and 150% improvement in the evaluation metric over manually crafted and unplanned workflows respectively. In practice, when building features for recommendation models serving over 120 million users, our approach has delivered real-world impact by reducing feature engineering cycles from three weeks to a single day.

Related papers

EmboCoach-Bench: Benchmarking AI Agents on Developing Embodied Robots [68.29056647487519]
Embodied AI is fueled by high-fidelity simulation and large-scale data collection.<n>However, this scaling capability remains bottlenecked by a reliance on labor-intensive manual oversight.<n>We introduce textscEmboCoach-Bench, a benchmark evaluating the capacity of LLM agents to autonomously engineer embodied policies.
arXiv Detail & Related papers (2026-01-29T11:33:49Z)
Managing the Stochastic: Foundations of Learning in Neuro-Symbolic Systems for Software Engineering [0.27195102129094995]
Current approaches to AI coding agents blur the lines between the Large Language Model and the agent itself.<n>This paper proposes setting the control boundary such that the LLM is treated as a component of the environment environment.
arXiv Detail & Related papers (2025-12-18T15:28:21Z)
Automated Multi-Agent Workflows for RTL Design [13.229297320467332]
We present VeriMaAS, a multi-agent framework designed to automatically compose agentic tasks for RTL code generation.<n>Our method improves synthesis performance by 5-7% for pass@k over fine-tuned baselines.
arXiv Detail & Related papers (2025-09-24T14:44:28Z)
Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute [61.00662702026523]
We propose a unified Test-Time Compute scaling framework that leverages increased inference-time instead of larger models.<n>Our framework incorporates two complementary strategies: internal TTC and external TTC.<n>We demonstrate our textbf32B model achieves a 46% issue resolution rate, surpassing significantly larger models such as DeepSeek R1 671B and OpenAI o1.
arXiv Detail & Related papers (2025-03-31T07:31:32Z)
MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics Manipulation [62.854649499866774]
Large Language Models (LLMs) have demonstrated remarkable planning abilities across various domains, including robotics manipulation and navigation.<n>We propose a novel multi-agent LLM framework that distributes high-level planning and low-level control code generation across specialized LLM agents.<n>We evaluate our approach on nine RLBench tasks, including long-horizon tasks, and demonstrate its ability to solve robotics manipulation in a zero-shot setting.
arXiv Detail & Related papers (2024-11-26T17:53:44Z)
Human-In-the-Loop Software Development Agents [12.830816751625829]
Large Language Models (LLMs)-based multi-agent paradigms for software engineering are introduced to automatically resolve software development tasks.<n>In this paper, we introduce a Human-in-the-loop LLM-based Agents framework (HULA) for software development.<n>We design, implement, and deploy the HULA framework into Atlassian for internal uses.
arXiv Detail & Related papers (2024-11-19T23:22:33Z)
MaCTG: Multi-Agent Collaborative Thought Graph for Automatic Programming [10.461509044478278]
MaCTG (MultiAgent Collaborative Thought Graph) is a novel multi-agent framework that employs a dynamic graph structure.<n>It autonomously assigns agent roles based on programming requirements, dynamically refines task distribution, and systematically verifies and integrates project-level code.<n>MaCTG significantly reduced operational costs by 89.09% compared to existing multi-agent frameworks.
arXiv Detail & Related papers (2024-10-25T01:52:15Z)
Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration [90.41908331897639]
Large language models (LLMs) have significantly benefited from training on diverse, high-quality task-specific data. We present a novel approach, ReverseGen, designed to automatically generate effective training samples.
arXiv Detail & Related papers (2024-10-22T06:43:28Z)
Benchmarking Agentic Workflow Generation [80.74757493266057]
We introduce WorfBench, a unified workflow generation benchmark with multi-faceted scenarios and intricate graph workflow structures.<n>We also present WorfEval, a systemic evaluation protocol utilizing subsequence and subgraph matching algorithms.<n>We observe that the generated can enhance downstream tasks, enabling them to achieve superior performance with less time during inference.
arXiv Detail & Related papers (2024-10-10T12:41:19Z)
Trust the PRoC3S: Solving Long-Horizon Robotics Problems with LLMs and Constraint Satisfaction [38.683780057806516]
Recent developments in pretrained large language models (LLMs) applied robotics have demonstrated their capacity for sequencing a set of discrete skills to achieve open-ended goals in simple robotic tasks. In this paper, we examine the topic of LLM planning for a set of continuously parameterized skills whose execution must avoid violations of a set of kinematic, geometric, and physical constraints. Experiments across three different simulated 3D domains demonstrate that our proposed strategy, PRoC3S, is capable of solving a wide range of complex manipulation tasks with realistic constraints on continuous parameters much more efficiently and effectively than existing baselines.
arXiv Detail & Related papers (2024-06-08T20:56:14Z)
SOEN-101: Code Generation by Emulating Software Process Models Using Large Language Model Agents [50.82665351100067]
FlowGen is a code generation framework that emulates software process models based on multiple Large Language Model (LLM) agents. We evaluate FlowGenScrum on four benchmarks: HumanEval, HumanEval-ET, MBPP, and MBPP-ET.
arXiv Detail & Related papers (2024-03-23T14:04:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.