A Systematic Study of Time Limit Exceeded Errors in Online Programming Assignments
- URL: http://arxiv.org/abs/2510.14339v1
- Date: Thu, 16 Oct 2025 06:18:55 GMT
- Title: A Systematic Study of Time Limit Exceeded Errors in Online Programming Assignments
- Authors: Jialu Zhang, Jialiang Gu, Wangmeiyu Zhang, José Pablo Cambronero, John Kolesar, Ruzica Piskac, Daming Li, Hanyuan Shi,
- Abstract summary: This paper presents the first large-scale empirical study of TLE errors in online programming.<n>We analyze 1000 Codeforces submissions with TLE errors, classified their root causes, and traced how users attempted to fix them.<n>We introduce Nettle, the first automated repair tool specifically designed for TLE errors, and Nettle-Eval, the first framework for evaluating TLE repairs.
- Score: 3.5043598215781393
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Online programming platforms such as Codeforces and LeetCode attract millions of users seeking to learn to program or refine their skills for industry interviews. A major challenge for these users is the Time Limit Exceeded (TLE) error, triggered when a program exceeds the execution time bound. Although designed as a performance safeguard, TLE errors are difficult to resolve: error messages provide no diagnostic insight, platform support is minimal, and existing debugging tools offer little help. As a result, many users abandon their submissions after repeated TLE failures. This paper presents the first large-scale empirical study of TLE errors in online programming. We manually analyzed 1000 Codeforces submissions with TLE errors, classified their root causes, and traced how users attempted to fix them. Our analysis shows that TLE errors often arise not only from inefficient algorithms but also from infinite loops, improper data structure use, and inefficient I/O, challenging the conventional view that TLEs are purely performance issues. Guided by these findings, we introduce Nettle, the first automated repair tool specifically designed for TLE errors, and Nettle-Eval, the first framework for evaluating TLE repairs. Integrating LLMs with targeted automated feedback generated by the compiler and test cases, Nettle produces small, correct code edits that eliminate TLEs while preserving functionality. Evaluated on the same 1000 real-world cases, Nettle achieves a 98.5% fix rate, far exceeding the strongest LLM baseline, and all of its repairs pass both Nettle-Eval and the platform's official checker, confirming the reliability of our framework.
Related papers
- ErrorLLM: Modeling SQL Errors for Text-to-SQL Refinement [57.98138819417949]
We propose ErrorLLM, a framework that explicitly models text-to- querying.<n>We show that ErrorLLM achieves the most significant improvements over backbone initial generation.<n>ErrorLLM addresses both sides by high detection F1 score while maintaining refinement effectiveness.
arXiv Detail & Related papers (2026-03-04T05:27:20Z) - Summary-Mediated Repair: Can LLMs use code summarisation as a tool for program repair? [0.0]
Large Language Models (LLMs) often produce code with subtle implementation-level bugs despite strong benchmark performance.<n>We propose summary-mediated repair, a prompt-only pipeline for program repair.
arXiv Detail & Related papers (2025-11-24T05:33:38Z) - LLM-Based Repair of Static Nullability Errors [14.857404348789201]
We present NullRepair, a system that integrates LLMs into a structured workflow for resolving nullability errors from a nullability checker.<n>NullRepair resolves an average of 72% of the errors that remain after applying a state-of-the-art annotation inference technique.<n>Unlike a naively-prompted LLM, NullRepair also largely preserves program semantics.
arXiv Detail & Related papers (2025-07-28T09:55:04Z) - Do AI models help produce verified bug fixes? [62.985237003585674]
Large Language Models are used to produce corrections to software bugs.<n>This paper investigates how programmers use Large Language Models to complement their own skills.<n>The results are a first step towards a proper role for AI and LLMs in providing guaranteed-correct fixes to program bugs.
arXiv Detail & Related papers (2025-07-21T17:30:16Z) - Specification-Guided Repair of Arithmetic Errors in Dafny Programs using LLMs [79.74676890436174]
We present an APR tool for Dafny that uses formal specifications as oracles for fault localization and repair.<n>We localize faults through a series of steps, which include using Hoare logic to determine the state of each statement within the program.<n>Our tool achieves 89.6% fault localization coverage and GPT-4o mini yields the highest repair success rate of 74.18%.
arXiv Detail & Related papers (2025-07-04T15:36:12Z) - Counterexample Guided Program Repair Using Zero-Shot Learning and MaxSAT-based Fault Localization [0.0]
Automated Program Repair (APR) for introductory programming assignments (IPAs) is motivated by the large number of student enrollments.<n>We propose a novel approach that combines the strengths of both FM-based fault localization and Large Language Models (LLMs)<n>Our method uses MaxSAT-based fault localization to identify buggy parts of a program, then presents the LLM with a program sketch devoid of these buggy statements.
arXiv Detail & Related papers (2024-12-19T12:08:44Z) - ToolScan: A Benchmark for Characterizing Errors in Tool-Use LLMs [77.79172008184415]
TOOLSCAN is a new benchmark to identify error patterns in LLM output on tool-use tasks.<n>We show that even the most prominent LLMs exhibit these error patterns in their outputs.<n>Researchers can use these insights from TOOLSCAN to guide their error mitigation strategies.
arXiv Detail & Related papers (2024-11-20T18:56:22Z) - ROCODE: Integrating Backtracking Mechanism and Program Analysis in Large Language Models for Code Generation [31.363781211927947]
Large language models (LLMs) have achieved impressive performance in code generation.<n>LLMs are susceptible to error accumulation during code generation.<n>We propose ROCODE, which integrates the backtracking mechanism and program analysis into LLMs for code generation.
arXiv Detail & Related papers (2024-11-11T16:39:13Z) - LecPrompt: A Prompt-based Approach for Logical Error Correction with CodeBERT [28.711745671275477]
LecPrompt is a prompt-based approach to localize and repair logical errors.
It harnesses the capabilities of CodeBERT, a transformer-based large language model trained on code.
For Python, LecPrompt achieves a noteworthy 74.58% top-1 token-level repair accuracy.
In Java, LecPrompt delivers a 69.23% top-1 token-level repair accuracy.
arXiv Detail & Related papers (2024-10-10T01:56:04Z) - Subtle Errors in Reasoning: Preference Learning via Error-injected Self-editing [59.405145971637204]
We propose a novel preference learning framework called eRror-Injected Self-Editing (RISE)<n>RISE injects predefined subtle errors into pivotal tokens in reasoning or steps to construct hard pairs for error mitigation.<n>Experiments validate the effectiveness of RISE, with preference learning on Qwen2-7B-Instruct yielding notable improvements of 3.0% on GSM8K and 7.9% on MATH with only 4.5K training samples.
arXiv Detail & Related papers (2024-10-09T07:43:38Z) - Learning to Ask: When LLM Agents Meet Unclear Instruction [55.65312637965779]
Large language models (LLMs) can leverage external tools for addressing a range of tasks unattainable through language skills alone.<n>We evaluate the performance of LLMs tool-use under imperfect instructions, analyze the error patterns, and build a challenging tool-use benchmark called Noisy ToolBench.<n>We propose a novel framework, Ask-when-Needed (AwN), which prompts LLMs to ask questions to users whenever they encounter obstacles due to unclear instructions.
arXiv Detail & Related papers (2024-08-31T23:06:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.