Related papers: Estimating Difficulty Levels of Programming Problems with Pre-trained Model

Estimating Difficulty Levels of Programming Problems with Pre-trained Model

URL: http://arxiv.org/abs/2406.08828v1
Date: Thu, 13 Jun 2024 05:38:20 GMT
Title: Estimating Difficulty Levels of Programming Problems with Pre-trained Model
Authors: Zhiyuan Wang, Wei Zhang, Jun Wang,
Abstract summary: The difficulty level of each programming problem serves as an essential reference for guiding students' adaptive learning. We formulate the problem of automatic difficulty level estimation of each programming problem, given its textual description and a solution example of code. For tackling this problem, we propose to couple two pre-trained models, one for text modality and the other for code modality, into a unified model.
Score: 18.92661958433282
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As the demand for programming skills grows across industries and academia, students often turn to Programming Online Judge (POJ) platforms for coding practice and competition. The difficulty level of each programming problem serves as an essential reference for guiding students' adaptive learning. However, current methods of determining difficulty levels either require extensive expert annotations or take a long time to accumulate enough student solutions for each problem. To address this issue, we formulate the problem of automatic difficulty level estimation of each programming problem, given its textual description and a solution example of code. For tackling this problem, we propose to couple two pre-trained models, one for text modality and the other for code modality, into a unified model. We built two POJ datasets for the task and the results demonstrate the effectiveness of the proposed approach and the contributions of both modalities.

Related papers

CPRet: A Dataset, Benchmark, and Model for Retrieval in Competitive Programming [56.17331530444765]
CPRet is a retrieval-oriented benchmark suite for competitive programming.<n>It covers four retrieval tasks: two code-centric (i.e., Text-to-Code and Code-to-Code) and two newly proposed problem-centric tasks (i.e., Problem-to-Duplicate and Simplified-to-Full)<n>Our contribution includes both high-quality training data and temporally separated test sets for reliable evaluation.
arXiv Detail & Related papers (2025-05-19T10:07:51Z)
THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models [65.39456695678713]
We introduce approximate measures of problem-level difficulty and demonstrate that a clear relationship between problem difficulty and optimal token spend exists. We find that in general, reasoning models are poorly calibrated, particularly on easy problems. We introduce THOUGHTTERMINATOR, a training-free black box decoding technique that significantly improves reasoning model calibration.
arXiv Detail & Related papers (2025-04-17T22:16:30Z)
Probing the Unknown: Exploring Student Interactions with Probeable Problems at Scale in Introductory Programming [4.1153199495993364]
This study explores the use of Probeable Problems'', automatically gradable tasks that have deliberately vague or incomplete specifications. Such problems require students to submit test inputs, or probes', to clarify requirements before implementation. Systematic strategies, such as thoroughly exploring expected behavior before coding, resulted in fewer incorrect code submissions and correlated with course success.
arXiv Detail & Related papers (2025-04-16T02:50:00Z)
DAST: Difficulty-Aware Self-Training on Large Language Models [68.30467836807362]
Large Language Models (LLM) self-training methods always under-sample on challenging queries. This work proposes a difficulty-aware self-training framework that focuses on improving the quantity and quality of self-generated responses.
arXiv Detail & Related papers (2025-03-12T03:36:45Z)
EHOP: A Dataset of Everyday NP-Hard Optimization Problems [66.41749917354159]
Everyday Hard Optimization Problems (EHOP) is a collection of NP-hard optimization problems expressed in natural language. EHOP includes problem formulations that could be found in computer science textbooks, versions that are dressed up as problems that could arise in real life, and variants of well-known problems with inverted rules. We find that state-of-the-art LLMs, across multiple prompting strategies, systematically solve textbook problems more accurately than their real-life and inverted counterparts.
arXiv Detail & Related papers (2025-02-19T14:39:59Z)
Knowledge Tracing in Programming Education Integrating Students' Questions [0.0]
This paper introduces SQKT (Students' Question-based Knowledge Tracing), a knowledge tracing model that leverages students' questions and automatically extracted skill information. Experimental results demonstrate SQKT's superior performance in predicting student completion across various Python programming courses of differing difficulty levels. SQKT can be used to tailor educational content to individual learning needs and design adaptive learning systems in computer science education.
arXiv Detail & Related papers (2025-01-22T14:13:40Z)
Easy2Hard-Bench: Standardized Difficulty Labels for Profiling LLM Performance and Generalization [126.27645170941268]
We present Easy2Hard-Bench, a collection of 6 benchmark datasets spanning various domains. Each problem within these datasets is annotated with numerical difficulty scores. We provide a comprehensive analysis of their performance and generalization capabilities across varying levels of difficulty.
arXiv Detail & Related papers (2024-09-27T03:49:56Z)
Learning Task Decomposition to Assist Humans in Competitive Programming [90.4846613669734]
We introduce a novel objective for learning task decomposition, termed value (AssistV) We collect a dataset of human repair experiences on different decomposed solutions. Under 177 hours of human study, our method enables non-experts to solve 33.3% more problems, speeds them up by 3.3x, and empowers them to match unassisted experts.
arXiv Detail & Related papers (2024-06-07T03:27:51Z)
Distilling Algorithmic Reasoning from LLMs via Explaining Solution Programs [2.3020018305241337]
Distilling explicit chain-of-thought reasoning paths has emerged as an effective method for improving the reasoning abilities of large language models. We propose a novel approach to distill reasoning abilities from LLMs by leveraging their capacity to explain solutions. Our experiments demonstrate that learning from explanations enables the Reasoner to more effectively guide program implementation by a Coder.
arXiv Detail & Related papers (2024-04-11T22:19:50Z)
PPM: Automated Generation of Diverse Programming Problems for Benchmarking Code Generation Models [10.491051578439722]
We propose the idea of programming problem merging (PPM) and provide two implementation of this idea, we utilize our tool on two widely-used datasets. The results demonstrate the effectiveness of our tool in generating more challenging, diverse, and natural programming problems.
arXiv Detail & Related papers (2024-01-28T02:27:38Z)
Data-CUBE: Data Curriculum for Instruction-based Sentence Representation Learning [85.66907881270785]
We propose a data curriculum method, namely Data-CUBE, that arranges the orders of all the multi-task data for training. In the task level, we aim to find the optimal task order to minimize the total cross-task interference risk. In the instance level, we measure the difficulty of all instances per task, then divide them into the easy-to-difficult mini-batches for training.
arXiv Detail & Related papers (2024-01-07T18:12:20Z)
ACES: Generating Diverse Programming Puzzles with with Autotelic Generative Models [20.039580079339537]
Autotelic CodE Search (ACES) jointly optimize for the diversity and difficulty of generated problems. We represent problems in a space of semantic descriptors describing the programming skills required to solve them. ACES iteratively prompts a large language model to generate difficult problems achieving a diversity of target semantic descriptors.
arXiv Detail & Related papers (2023-10-15T14:57:14Z)
Tag Prediction of Competitive Programming Problems using Deep Learning Techniques [0.0]
A well-liked method for developing programming abilities is competitive programming. It can be tough for novices and even veteran programmers to traverse the wide collection of questions. This can be done using automated tagging of the questions using Text Classification.
arXiv Detail & Related papers (2023-08-03T16:39:02Z)
Leveraging Training Data in Few-Shot Prompting for Numerical Reasoning [10.889271604723312]
Chain-of-thought (CoT) prompting with large language models has proven effective in numerous natural language processing tasks. We investigate two approaches to leverage the training data in a few-shot prompting scenario: dynamic program prompting and program distillation. Our experiments on three standard math word problem (MWP) datasets demonstrate the effectiveness of these approaches.
arXiv Detail & Related papers (2023-05-29T16:01:40Z)
Towards a Holistic Understanding of Mathematical Questions with Contrastive Pre-training [65.10741459705739]
We propose a novel contrastive pre-training approach for mathematical question representations, namely QuesCo. We first design two-level question augmentations, including content-level and structure-level, which generate literally diverse question pairs with similar purposes. Then, to fully exploit hierarchical information of knowledge concepts, we propose a knowledge hierarchy-aware rank strategy.
arXiv Detail & Related papers (2023-01-18T14:23:29Z)
ProtoTransformer: A Meta-Learning Approach to Providing Student Feedback [54.142719510638614]
In this paper, we frame the problem of providing feedback as few-shot classification. A meta-learner adapts to give feedback to student code on a new programming question from just a few examples by instructors. Our approach was successfully deployed to deliver feedback to 16,000 student exam-solutions in a programming course offered by a tier 1 university.
arXiv Detail & Related papers (2021-07-23T22:41:28Z)
Measuring Coding Challenge Competence With APPS [54.22600767666257]
We introduce APPS, a benchmark for code generation. Our benchmark includes 10,000 problems, which range from having simple one-line solutions to being substantial algorithmic challenges. Recent models such as GPT-Neo can pass approximately 15% of the test cases of introductory problems.
arXiv Detail & Related papers (2021-05-20T17:58:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.