Related papers: Synthesizing High-Quality Programming Tasks with LLM-based Expert and Student Agents

Synthesizing High-Quality Programming Tasks with LLM-based Expert and Student Agents

URL: http://arxiv.org/abs/2504.07655v1
Date: Thu, 10 Apr 2025 11:08:39 GMT
Title: Synthesizing High-Quality Programming Tasks with LLM-based Expert and Student Agents
Authors: Manh Hung Nguyen, Victor-Alexandru Pădurean, Alkis Gotovos, Sebastian Tschiatschek, Adish Singla,
Abstract summary: PyTaskSyn is a novel synthesis technique that first generates a programming task and then decides whether it meets certain quality criteria to be given to students.<n>We show that PyTaskSyn significantly improves task quality compared to baseline techniques and showcases the importance of each specialized agent type in our validation pipeline.
Score: 26.884829816265174
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generative AI is transforming computing education by enabling the automatic generation of personalized content and feedback. We investigate its capabilities in providing high-quality programming tasks to students. Despite promising advancements in task generation, a quality gap remains between AI-generated and expert-created tasks. The AI-generated tasks may not align with target programming concepts, could be incomprehensible for students to solve, or may contain critical issues such as incorrect tests. Existing works often require interventions from human teachers for validation. We address these challenges by introducing PyTaskSyn, a novel synthesis technique that first generates a programming task and then decides whether it meets certain quality criteria to be given to students. The key idea is to break this process into multiple stages performed by expert and student agents simulated using both strong and weaker generative models. Through extensive evaluation, we show that PyTaskSyn significantly improves task quality compared to baseline techniques and showcases the importance of each specialized agent type in our validation pipeline. Additionally, we conducted user studies using our publicly available web application and show that PyTaskSyn can deliver high-quality programming tasks comparable to expert-designed ones while reducing workload and costs, and being more engaging than programming tasks that are available in online resources.

Related papers

Unlimited Practice Opportunities: Automated Generation of Comprehensive, Personalized Programming Tasks [0.0]
This paper introduces and evaluates a new feature of the so-called Tutor Kai for generating comprehensive programming tasks.<n>The presented system allows students to freely choose programming concepts and contextual themes for their tasks.
arXiv Detail & Related papers (2025-03-12T10:35:25Z)
Math Multiple Choice Question Generation via Human-Large Language Model Collaboration [5.081508251092439]
Multiple choice questions (MCQs) are a popular method for evaluating students' knowledge. Recent advances in large language models (LLMs) have sparked interest in automating MCQ creation. This paper introduces a prototype tool designed to facilitate collaboration between LLMs and educators.
arXiv Detail & Related papers (2024-05-01T20:53:13Z)
TaskBench: Benchmarking Large Language Models for Task Automation [82.2932794189585]
We introduce TaskBench, a framework to evaluate the capability of large language models (LLMs) in task automation. Specifically, task decomposition, tool selection, and parameter prediction are assessed. Our approach combines automated construction with rigorous human verification, ensuring high consistency with human evaluation.
arXiv Detail & Related papers (2023-11-30T18:02:44Z)
Multitask Learning with No Regret: from Improved Confidence Bounds to Active Learning [79.07658065326592]
Quantifying uncertainty in the estimated tasks is of pivotal importance for many downstream applications, such as online or active learning. We provide novel multitask confidence intervals in the challenging setting when neither the similarity between tasks nor the tasks' features are available to the learner. We propose a novel online learning algorithm that achieves such improved regret without knowing this parameter in advance.
arXiv Detail & Related papers (2023-08-03T13:08:09Z)
Synthesizing a Progression of Subtasks for Block-Based Visual Programming Tasks [21.33708484899808]
We propose a novel synthesis algorithm that generates a progression of subtasks that are high-quality, well-spaced in terms of their complexity. We show the utility of our synthesis algorithm in improving the efficacy of AI agents for solving tasks in the Karel programming environment.
arXiv Detail & Related papers (2023-05-27T16:24:36Z)
Comparing Software Developers with ChatGPT: An Empirical Investigation [0.0]
This paper conducts an empirical investigation, contrasting the performance of software engineers and AI systems, like ChatGPT, across different evaluation metrics. The paper posits that a comprehensive comparison of software engineers and AI-based solutions, considering various evaluation criteria, is pivotal in fostering human-machine collaboration.
arXiv Detail & Related papers (2023-05-19T17:25:54Z)
Reinforcement Learning with Success Induced Task Prioritization [68.8204255655161]
We introduce Success Induced Task Prioritization (SITP), a framework for automatic curriculum learning. The algorithm selects the order of tasks that provide the fastest learning for agents. We demonstrate that SITP matches or surpasses the results of other curriculum design methods.
arXiv Detail & Related papers (2022-12-30T12:32:43Z)
Autonomous Open-Ended Learning of Tasks with Non-Stationary Interdependencies [64.0476282000118]
Intrinsic motivations have proven to generate a task-agnostic signal to properly allocate the training time amongst goals. While the majority of works in the field of intrinsically motivated open-ended learning focus on scenarios where goals are independent from each other, only few of them studied the autonomous acquisition of interdependent tasks. In particular, we first deepen the analysis of a previous system, showing the importance of incorporating information about the relationships between tasks at a higher level of the architecture. Then we introduce H-GRAIL, a new system that extends the previous one by adding a new learning layer to store the autonomously acquired sequences
arXiv Detail & Related papers (2022-05-16T10:43:01Z)
From {Solution Synthesis} to {Student Attempt Synthesis} for Block-Based Visual Programming Tasks [20.64766977405438]
We introduce a novel benchmark, StudentSyn, centered around the following challenge. For a given student, synthesize the student's attempt on a new target task after observing the student's attempt on a fixed reference task. This challenge is akin to that of program synthesis; however, instead of a solution (i.e., program an expert would write), the goal here is to synthesize a student attempt (i.e., program that a given student would write)
arXiv Detail & Related papers (2022-05-03T01:32:47Z)
Environment Generation for Zero-Shot Compositional Reinforcement Learning [105.35258025210862]
Compositional Design of Environments (CoDE) trains a Generator agent to automatically build a series of compositional tasks tailored to the agent's current skill level. We learn to generate environments composed of multiple pages or rooms, and train RL agents capable of completing wide-range of complex tasks in those environments. CoDE yields 4x higher success rate than the strongest baseline, and demonstrates strong performance of real websites learned on 3500 primitive tasks.
arXiv Detail & Related papers (2022-01-21T21:35:01Z)
Task-Specific Normalization for Continual Learning of Blind Image Quality Models [105.03239956378465]
We present a simple yet effective continual learning method for blind image quality assessment (BIQA) The key step in our approach is to freeze all convolution filters of a pre-trained deep neural network (DNN) for an explicit promise of stability. We assign each new IQA dataset (i.e., task) a prediction head, and load the corresponding normalization parameters to produce a quality score. The final quality estimate is computed by black a weighted summation of predictions from all heads with a lightweight $K$-means gating mechanism.
arXiv Detail & Related papers (2021-07-28T15:21:01Z)
Watch-And-Help: A Challenge for Social Perception and Human-AI Collaboration [116.28433607265573]
We introduce Watch-And-Help (WAH), a challenge for testing social intelligence in AI agents. In WAH, an AI agent needs to help a human-like agent perform a complex household task efficiently. We build VirtualHome-Social, a multi-agent household environment, and provide a benchmark including both planning and learning based baselines.
arXiv Detail & Related papers (2020-10-19T21:48:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.