Related papers: DSMentor: Enhancing Data Science Agents with Curriculum Learning and Online Knowledge Accumulation

DSMentor: Enhancing Data Science Agents with Curriculum Learning and Online Knowledge Accumulation

URL: http://arxiv.org/abs/2505.14163v1
Date: Tue, 20 May 2025 10:16:21 GMT
Title: DSMentor: Enhancing Data Science Agents with Curriculum Learning and Online Knowledge Accumulation
Authors: He Wang, Alexander Hanbo Li, Yiqun Hu, Sheng Zhang, Hideo Kobayashi, Jiani Zhang, Henry Zhu, Chung-Wei Hang, Patrick Ng,
Abstract summary: Large language model (LLM) agents have shown promising performance in generating code for solving complex data science problems.<n>We develop a novel inference-time optimization framework, referred to as DSMentor, to enhance LLM agent performance.<n>Our work underscores the importance of developing effective strategies for accumulating and utilizing knowledge during inference.
Score: 59.79833777420334
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language model (LLM) agents have shown promising performance in generating code for solving complex data science problems. Recent studies primarily focus on enhancing in-context learning through improved search, sampling, and planning techniques, while overlooking the importance of the order in which problems are tackled during inference. In this work, we develop a novel inference-time optimization framework, referred to as DSMentor, which leverages curriculum learning -- a strategy that introduces simpler task first and progressively moves to more complex ones as the learner improves -- to enhance LLM agent performance in challenging data science tasks. Our mentor-guided framework organizes data science tasks in order of increasing difficulty and incorporates a growing long-term memory to retain prior experiences, guiding the agent's learning progression and enabling more effective utilization of accumulated knowledge. We evaluate DSMentor through extensive experiments on DSEval and QRData benchmarks. Experiments show that DSMentor using Claude-3.5-Sonnet improves the pass rate by up to 5.2% on DSEval and QRData compared to baseline agents. Furthermore, DSMentor demonstrates stronger causal reasoning ability, improving the pass rate by 8.8% on the causality problems compared to GPT-4 using Program-of-Thoughts prompts. Our work underscores the importance of developing effective strategies for accumulating and utilizing knowledge during inference, mirroring the human learning process and opening new avenues for improving LLM performance through curriculum-based inference optimization.

Related papers

Improving LLM Agent Planning with In-Context Learning via Atomic Fact Augmentation and Lookahead Search [48.348209577994865]
Large Language Models (LLMs) are increasingly capable but often require significant guidance or extensive interaction history to perform effectively in complex, interactive environments.<n>We introduce a novel LLM agent framework that enhances planning capabilities through in-context learning.<n>Our agent learns to extract task-critical atomic facts'' from its interaction trajectories.
arXiv Detail & Related papers (2025-06-10T18:36:31Z)
How Difficulty-Aware Staged Reinforcement Learning Enhances LLMs' Reasoning Capabilities: A Preliminary Experimental Study [16.441081996257576]
This paper presents a rigorous experimental investigation into how difficulty-aware staged reinforcement learning strategies can substantially improve reasoning performance.<n>We show that strategically selecting training data according to well-defined difficulty levels markedly enhances RL optimization.<n>We will open-source our datasets on GitHub and Hugging Face.
arXiv Detail & Related papers (2025-04-01T14:18:38Z)
Improving Retrospective Language Agents via Joint Policy Gradient Optimization [57.35348425288859]
RetroAct is a framework that jointly optimize both task-planning and self-reflective evolution capabilities in language agents.<n>We develop a two-stage joint optimization process that integrates imitation learning and reinforcement learning.<n>We conduct extensive experiments across various testing environments, demonstrating RetroAct has substantial improvements in task performance and decision-making processes.
arXiv Detail & Related papers (2025-03-03T12:54:54Z)
KBAlign: Efficient Self Adaptation on Specific Knowledge Bases [73.34893326181046]
We present KBAlign, a self-supervised framework that enhances RAG systems through efficient model adaptation.<n>Our key insight is to leverage the model's intrinsic capabilities for knowledge alignment through two innovative mechanisms.<n> Experiments demonstrate that KBAlign can achieve 90% of the performance gain obtained through GPT-4-supervised adaptation.
arXiv Detail & Related papers (2024-11-22T08:21:03Z)
AssistRAG: Boosting the Potential of Large Language Models with an Intelligent Information Assistant [23.366991558162695]
Large Language Models generate factually incorrect information, known as "hallucination" To cope with these challenges, we propose Assistant-based Retrieval-Augmented Generation (AssistRAG) This assistant manages memory and knowledge through tool usage, action execution, memory building, and plan specification.
arXiv Detail & Related papers (2024-11-11T09:03:52Z)
RAG-Modulo: Solving Sequential Tasks using Experience, Critics, and Language Models [5.0741409008225755]
Large language models (LLMs) have emerged as promising tools for solving challenging robotic tasks. Most existing LLM-based agents lack the ability to retain and learn from past interactions. We propose RAG-Modulo, a framework that enhances LLM-based agents with a memory of past interactions and incorporates critics to evaluate the agents' decisions.
arXiv Detail & Related papers (2024-09-18T20:03:32Z)
SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models [54.78329741186446]
We propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation. Experiments across both in-domain and out-of-domain benchmarks in English and Chinese demonstrate the effectiveness of the proposed paradigm.
arXiv Detail & Related papers (2024-08-28T06:33:03Z)
CEM: A Data-Efficient Method for Large Language Models to Continue Evolving From Mistakes [36.14056870453356]
Continual learning is essential for keeping Large Language Models current and addressing their shortcomings.<n>We propose the Continue Evolving from Mistakes (CEM) method, a data-efficient approach aiming to collect CPT data.<n> Experiments show that CEM substantially enhances multiple models' performance on both in-domain and out-of-domain QA tasks, achieving gains of up to 29.63%.
arXiv Detail & Related papers (2024-04-11T17:44:56Z)
LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning [64.55001982176226]
LIBERO is a novel benchmark of lifelong learning for robot manipulation. We focus on how to efficiently transfer declarative knowledge, procedural knowledge, or the mixture of both. We develop an extendible procedural generation pipeline that can in principle generate infinitely many tasks.
arXiv Detail & Related papers (2023-06-05T23:32:26Z)
Learning to Optimize for Reinforcement Learning [58.01132862590378]
Reinforcement learning (RL) is essentially different from supervised learning, and in practice, these learneds do not work well even in simple RL tasks. Agent-gradient distribution is non-independent and identically distributed, leading to inefficient meta-training. We show that, although only trained in toy tasks, our learned can generalize unseen complex tasks in Brax.
arXiv Detail & Related papers (2023-02-03T00:11:02Z)
KnowRU: Knowledge Reusing via Knowledge Distillation in Multi-agent Reinforcement Learning [16.167201058368303]
Deep Reinforcement Learning (RL) algorithms have achieved dramatically progress in the multi-agent area. To alleviate this problem, efficient leveraging of the historical experience is essential. We propose a method, named "KnowRU" for knowledge reusing.
arXiv Detail & Related papers (2021-03-27T12:38:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.