Clustering MOOC Programming Solutions to Diversify Their Presentation to Students
- URL: http://arxiv.org/abs/2403.19398v2
- Date: Fri, 11 Oct 2024 21:45:33 GMT
- Title: Clustering MOOC Programming Solutions to Diversify Their Presentation to Students
- Authors: Elizaveta Artser, Anastasiia Birillo, Yaroslav Golubev, Maria Tigina, Hieke Keuning, Nikolay Vyahhi, Timofey Bryksin,
- Abstract summary: A lot of MOOCs simply show the most recent solutions, disregarding their diversity or quality, and thus hindering the students' opportunity to learn.
We adapted the existing plagiarism detection tool JPlag to Python submissions on Hyperskill, a popular MOOC platform.
We developed our own tool called Rhubarb, which standardizes solutions that are algorithmically the same, then calculates the structure-aware edit distance between them, and then applies clustering.
- Score: 6.219350126324697
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In many MOOCs, whenever a student completes a programming task, they can see previous solutions of other students to find potentially different ways of solving the problem and to learn new coding constructs. However, a lot of MOOCs simply show the most recent solutions, disregarding their diversity or quality, and thus hindering the students' opportunity to learn. In this work, we explore this novel problem for the first time. To solve it, we adapted the existing plagiarism detection tool JPlag to Python submissions on Hyperskill, a popular MOOC platform. However, due to the tool's inner algorithm, JPLag fully processed only 46 out of 867 studied tasks. Therefore, we developed our own tool called Rhubarb. This tool first standardizes solutions that are algorithmically the same, then calculates the structure-aware edit distance between them, and then applies clustering. Finally, it selects one example from each of the largest clusters, thus ensuring their diversity. Rhubarb was able to handle all 867 tasks successfully. We compared different approaches on a set of 59 real-life tasks that both tools could process. Eight experts rated the selected solutions based on diversity, code quality, and usefulness. The default platform approach of simply selecting recent submissions received on average 3.12 out of 5, JPlag - 3.77, Rhubarb - 3.50. To ensure both quality and coverage, we created a system that combines both tools. We conclude our work by discussing the future of this new problem and the research needed to solve it better.
Related papers
- AlgoSimBench: Identifying Algorithmically Similar Problems for Competitive Programming [2.3020018305241337]
We introduce AlgoSimBench, a new benchmark designed to assess ability to identify algorithmically similar problems (ASPs)<n>AlgoSimBench consists of 1317 problems, annotated with distinct fine-grained algorithm tags, from which we distract 402 multiple-choice questions (MCQs)<n>Our evaluation reveals that LLMs struggle to identify ASPs, with the best-performing model (o3-mini) achieving only 65.9% accuracy on the MCQ task.<n>We propose attempted solution matching (ASM), a novel method for improving problem similarity detection.
arXiv Detail & Related papers (2025-07-21T08:34:20Z) - Generative Modeling for Mathematical Discovery [0.19791587637442667]
We present a new implementation of the genetic algorithm it funsearch.
Our aim is to generate examples of interest to mathematicians.
It does not require expertise in machine learning or access to high-performance computing resources.
arXiv Detail & Related papers (2025-03-14T03:54:43Z) - Diverse Inference and Verification for Advanced Reasoning [19.88677753421871]
Reasoning LLMs such as OpenAI o1, o3 and DeepSeek R1 have made significant progress in mathematics and coding.
We use a diverse inference approach that combines multiple models and methods at test time.
We find that verifying mathematics and code problems, and rejection sampling on other problems is simple and effective.
arXiv Detail & Related papers (2025-02-14T07:22:25Z) - Can Language Models Solve Olympiad Programming? [40.54366634332231]
This paper introduces the USACO benchmark with 307 problems from the USA Computing Olympiad.
We construct and test a range of LM inference methods for competitive programming for the first time.
We find GPT-4 only achieves a 8.7% pass@1 accuracy with zero-shot chain-of-thought prompting.
arXiv Detail & Related papers (2024-04-16T23:27:38Z) - Orca-Math: Unlocking the potential of SLMs in Grade School Math [10.206509967833664]
A recent study hypothesized that the smallest model size, needed to achieve over 80% accuracy on the GSM8K benchmark, is 34 billion parameters.
To reach this level of performance with smaller models, researcher often train SLMs to generate Python code or use tools to help avoid calculation errors.
Our approach has the following key elements: A high quality synthetic dataset of 200K math problems created using a multi-agent setup where agents collaborate to create the data.
arXiv Detail & Related papers (2024-02-16T23:44:38Z) - Interpretable Decision Tree Search as a Markov Decision Process [8.530182510074983]
Finding an optimal decision tree for a supervised learning task is a challenging problem to solve at scale.
It was recently proposed to frame the problem as a Markov Decision Problem (MDP) and use deep reinforcement learning to tackle scaling.
We propose instead to scale the resolution of such MDPs using an information-theoretic tests generating function that generates for every state.
arXiv Detail & Related papers (2023-09-22T08:18:08Z) - Tree of Thoughts: Deliberate Problem Solving with Large Language Models [52.31950122881687]
We introduce a new framework for language model inference, Tree of Thoughts (ToT)
ToT generalizes over the popular Chain of Thought approach to prompting language models.
Our experiments show that ToT significantly enhances language models' problem-solving abilities.
arXiv Detail & Related papers (2023-05-17T23:16:17Z) - NEVIS'22: A Stream of 100 Tasks Sampled from 30 Years of Computer Vision
Research [96.53307645791179]
We introduce the Never-Ending VIsual-classification Stream (NEVIS'22), a benchmark consisting of a stream of over 100 visual classification tasks.
Despite being limited to classification, the resulting stream has a rich diversity of tasks from OCR, to texture analysis, scene recognition, and so forth.
Overall, NEVIS'22 poses an unprecedented challenge for current sequential learning approaches due to the scale and diversity of tasks.
arXiv Detail & Related papers (2022-11-15T18:57:46Z) - The Machine Learning for Combinatorial Optimization Competition (ML4CO):
Results and Insights [59.93939636422896]
The ML4CO aims at improving state-of-the-art optimization solvers by replacing key components.
The competition featured three challenging tasks: finding the best feasible solution, producing the tightest optimality certificate, and giving an appropriate routing configuration.
arXiv Detail & Related papers (2022-03-04T17:06:00Z) - Winning solutions and post-challenge analyses of the ChaLearn AutoDL
challenge 2019 [112.36155380260655]
This paper reports the results and post-challenge analyses of ChaLearn's AutoDL challenge series.
Results show that DL methods dominated, though popular Neural Architecture Search (NAS) was impractical.
A high level modular organization emerged featuring a "meta-learner", "data ingestor", "model selector", "model/learner", and "evaluator"
arXiv Detail & Related papers (2022-01-11T06:21:18Z) - Learning by Fixing: Solving Math Word Problems with Weak Supervision [70.62896781438694]
Previous neural solvers of math word problems (MWPs) are learned with full supervision and fail to generate diverse solutions.
We introduce a textitweakly-supervised paradigm for learning MWPs.
Our method only requires the annotations of the final answers and can generate various solutions for a single problem.
arXiv Detail & Related papers (2020-12-19T03:10:21Z) - Adaptive Submodular Meta-Learning [28.24164217929491]
We introduce and study an adaptive submodular meta-learning problem.
The input of our problem is a set of items, where each item has a random state which is initially unknown.
Our objective is to adaptively select a group of items that achieve the best performance over a set of tasks.
arXiv Detail & Related papers (2020-12-11T01:28:55Z) - The Importance of Good Starting Solutions in the Minimum Sum of Squares
Clustering Problem [0.0]
The clustering problem has many applications in Machine Learning, Operations Research, and Statistics.
We propose three algorithms to create starting solutions for improvement algorithms for this problem.
arXiv Detail & Related papers (2020-04-06T22:13:41Z) - Meta Cyclical Annealing Schedule: A Simple Approach to Avoiding
Meta-Amortization Error [50.83356836818667]
We develop a novel meta-regularization objective using it cyclical annealing schedule and it maximum mean discrepancy (MMD) criterion.
The experimental results show that our approach substantially outperforms standard meta-learning algorithms.
arXiv Detail & Related papers (2020-03-04T04:43:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.