Play to Grade: Testing Coding Games as Classifying Markov Decision
Process
- URL: http://arxiv.org/abs/2110.14615v1
- Date: Wed, 27 Oct 2021 17:37:33 GMT
- Title: Play to Grade: Testing Coding Games as Classifying Markov Decision
Process
- Authors: Allen Nie, Emma Brunskill, Chris Piech
- Abstract summary: We formalize the challenge of providing feedback to interactive programs as a task of classifying Markov Decision Processes (MDPs)
Our method enables an automatic feedback system for interactive code assignments.
We release a dataset of 711,274 anonymized student submissions to a single assignment with hand-coded bug labels to support future research.
- Score: 45.147473767394104
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Contemporary coding education often presents students with the task of
developing programs that have user interaction and complex dynamic systems,
such as mouse based games. While pedagogically compelling, there are no
contemporary autonomous methods for providing feedback. Notably, interactive
programs are impossible to grade by traditional unit tests. In this paper we
formalize the challenge of providing feedback to interactive programs as a task
of classifying Markov Decision Processes (MDPs). Each student's program fully
specifies an MDP where the agent needs to operate and decide, under reasonable
generalization, if the dynamics and reward model of the input MDP should be
categorized as correct or broken. We demonstrate that by designing a
cooperative objective between an agent and an autoregressive model, we can use
the agent to sample differential trajectories from the input MDP that allows a
classifier to determine membership: Play to Grade. Our method enables an
automatic feedback system for interactive code assignments. We release a
dataset of 711,274 anonymized student submissions to a single assignment with
hand-coded bug labels to support future research.
Related papers
- Prompt Programming: A Platform for Dialogue-based Computational Problem Solving with Generative AI Models [22.339868419855904]
Students increasingly rely on generative AI tools for programming assistance, often without formal instruction or guidance.
This highlights a need to teach students how to effectively interact with AI models.
We developed a novel platform for prompt programming that enables authentic dialogue-based interactions.
arXiv Detail & Related papers (2025-03-06T09:56:07Z) - Program Synthesis Dialog Agents for Interactive Decision-Making [15.76727860626721]
We propose BeNYfits, a new benchmark for determining user eligibility for social benefits opportunities through interactive decision-making.
Our experiments show that GPT-4o scoring only 35.7 F1 using a ReAct-style chain-of-thought.
Our agent, ProADA, improves the F1 score to 55.6 while maintaining nearly the same number of dialog turns.
arXiv Detail & Related papers (2025-02-26T22:53:01Z) - QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search [89.97082652805904]
We propose QLASS (Q-guided Language Agent Stepwise Search), to automatically generate annotations by estimating Q-values.
With the stepwise guidance, we propose a Q-guided generation strategy to enable language agents to better adapt to long-term value.
We empirically demonstrate that QLASS can lead to more effective decision making through qualitative analysis.
arXiv Detail & Related papers (2025-02-04T18:58:31Z) - MarkovType: A Markov Decision Process Strategy for Non-Invasive Brain-Computer Interfaces Typing Systems [11.725845532549558]
This work focuses on the Rapid Serial Visual Presentation ( RSVP) paradigm of Brain-Computer Interfaces (BCIs) using noninvasive electroencephalography (EEG)
To improve performance in the classification of symbols while controlling the classification speed, we incorporate the typing setup into training by proposing a Partially Observable Markov Decision Process (POMDP) approach.
Experiments show that the proposed approach, MarkovType, results in a more accurate typing system compared to competitors.
arXiv Detail & Related papers (2024-12-20T12:59:41Z) - Collaborative Instance Object Navigation: Leveraging Uncertainty-Awareness to Minimize Human-Agent Dialogues [54.81155589931697]
Collaborative Instance object Navigation (CoIN) is a new task setting where the agent actively resolve uncertainties about the target instance.
We propose a novel training-free method, Agent-user Interaction with UncerTainty Awareness (AIUTA)
First, upon object detection, a Self-Questioner model initiates a self-dialogue within the agent to obtain a complete and accurate observation description.
An Interaction Trigger module determines whether to ask a question to the human, continue or halt navigation.
arXiv Detail & Related papers (2024-12-02T08:16:38Z) - Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning [51.52387511006586]
We propose Hierarchical Opponent modeling and Planning (HOP), a novel multi-agent decision-making algorithm.
HOP is hierarchically composed of two modules: an opponent modeling module that infers others' goals and learns corresponding goal-conditioned policies.
HOP exhibits superior few-shot adaptation capabilities when interacting with various unseen agents, and excels in self-play scenarios.
arXiv Detail & Related papers (2024-06-12T08:48:06Z) - WIP: A Unit Testing Framework for Self-Guided Personalized Online Robotics Learning [3.613641107321095]
This paper focuses on creating a system for unit testing while integrating it into the course workflow.
In line with the framework's personalized student-centered approach, this method makes it easier for students to revise, and debug their programming work.
The course workflow updated to include unit tests will strengthen the learning environment and make it more interactive so that students can learn how to program robots in a self-guided fashion.
arXiv Detail & Related papers (2024-05-18T00:56:46Z) - Prompt Customization for Continual Learning [57.017987355717935]
We reformulate the prompting approach for continual learning and propose the prompt customization (PC) method.
PC mainly comprises a prompt generation module (PGM) and a prompt modulation module (PMM)
We evaluate our method on four benchmark datasets for three diverse settings, including the class, domain, and task-agnostic incremental learning tasks.
arXiv Detail & Related papers (2024-04-28T03:28:27Z) - Learning Label Modular Prompts for Text Classification in the Wild [56.66187728534808]
We propose text classification in-the-wild, which introduces different non-stationary training/testing stages.
Decomposing a complex task into modular components can enable robust generalisation under such non-stationary environment.
We propose MODULARPROMPT, a label-modular prompt tuning framework for text classification tasks.
arXiv Detail & Related papers (2022-11-30T16:26:38Z) - ProtoTransformer: A Meta-Learning Approach to Providing Student Feedback [54.142719510638614]
In this paper, we frame the problem of providing feedback as few-shot classification.
A meta-learner adapts to give feedback to student code on a new programming question from just a few examples by instructors.
Our approach was successfully deployed to deliver feedback to 16,000 student exam-solutions in a programming course offered by a tier 1 university.
arXiv Detail & Related papers (2021-07-23T22:41:28Z) - Q-learning with Language Model for Edit-based Unsupervised Summarization [19.332743860240264]
We propose a new approach based on Q-learning with an edit-based summarization.
The method combines two key modules to form an Editorial Agent and Language Model converter.
Q-learning is leveraged to train the agent to produce proper edit actions.
arXiv Detail & Related papers (2020-10-09T05:47:00Z) - A Markov Decision Process Approach to Active Meta Learning [24.50189361694407]
In supervised learning, we fit a single statistical model to a given data set, assuming that the data is associated with a singular task.
In meta-learning, the data is associated with numerous tasks, and we seek a model that may perform well on all tasks simultaneously.
arXiv Detail & Related papers (2020-09-10T15:45:34Z) - CycAs: Self-supervised Cycle Association for Learning Re-identifiable
Descriptions [61.724894233252414]
This paper proposes a self-supervised learning method for the person re-identification (re-ID) problem.
Existing unsupervised methods usually rely on pseudo labels, such as those from video tracklets or clustering.
We introduce a different unsupervised method that allows us to learn pedestrian embeddings from raw videos, without resorting to pseudo labels.
arXiv Detail & Related papers (2020-07-15T09:52:35Z) - Learning and Solving Regular Decision Processes [15.533842336139067]
Regular Decision Processes (RDPs) are a recently introduced model that extends MDPs with non-Markovian dynamics and rewards.
We build on automata learning techniques with history clustering to learn such a Mealy machine and solve it by adapting MCTS.
arXiv Detail & Related papers (2020-03-02T16:36:16Z) - Learning Non-Markovian Reward Models in MDPs [0.0]
We show how to formalise the non-Markovian reward function using a Mealy machine.
In our formal setting, we consider a Markov decision process (MDP) that models the dynamic of the environment in which the agent evolves.
While the MDP is known by the agent, the reward function is unknown from the agent and must be learnt.
arXiv Detail & Related papers (2020-01-25T10:51:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.