Explaining Competitive-Level Programming Solutions using LLMs
- URL: http://arxiv.org/abs/2307.05337v1
- Date: Tue, 11 Jul 2023 15:26:49 GMT
- Title: Explaining Competitive-Level Programming Solutions using LLMs
- Authors: Jierui Li, Szymon Tworkowski, Yingying Wu and Raymond Mooney
- Abstract summary: We show that despite poor performance in solving competitive-level programming problems, state-of-the-art LLMs exhibit a strong capacity in describing and explaining solutions.
Our explanation generation methodology can generate a structured solution explanation for the problem containing descriptions and analysis.
- Score: 3.560501183771493
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we approach competitive-level programming problem-solving as a
composite task of reasoning and code generation. We propose a novel method to
automatically annotate natural language explanations to \textit{<problem,
solution>} pairs. We show that despite poor performance in solving
competitive-level programming problems, state-of-the-art LLMs exhibit a strong
capacity in describing and explaining solutions. Our explanation generation
methodology can generate a structured solution explanation for the problem
containing descriptions and analysis. To evaluate the quality of the annotated
explanations, we examine their effectiveness in two aspects: 1) satisfying the
human programming expert who authored the oracle solution, and 2) aiding LLMs
in solving problems more effectively. The experimental results on the
CodeContests dataset demonstrate that while LLM GPT3.5's and GPT-4's abilities
in describing the solution are comparable, GPT-4 shows a better understanding
of the key idea behind the solution.
Related papers
- Can LLMs plan paths with extra hints from solvers? [2.874944508343474]
Large Language Models (LLMs) have shown remarkable capabilities in natural language processing, mathematical problem solving, and tasks related to program synthesis.
This paper explores an approach for enhancing LLM performance in solving a classical robotic planning task by integrating solver-generated feedback.
arXiv Detail & Related papers (2024-10-07T14:00:08Z) - Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification [52.095460362197336]
Large language models (LLMs) struggle with consistent and accurate reasoning.
LLMs are trained primarily on correct solutions, reducing their ability to detect and learn from errors.
We propose a novel collaborative method integrating Chain-of-Thought (CoT) and Program-of-Thought (PoT) solutions for verification.
arXiv Detail & Related papers (2024-10-05T05:21:48Z) - Bridging Context Gaps: Leveraging Coreference Resolution for Long Contextual Understanding [28.191029786204624]
We introduce the Long Question Coreference Adaptation (LQCA) method to enhance the performance of large language models (LLMs)
This framework focuses on coreference resolution tailored to long contexts, allowing the model to identify and manage references effectively.
The framework provides easier-to-handle partitions for LLMs, promoting better understanding.
arXiv Detail & Related papers (2024-10-02T15:39:55Z) - Navigating the Labyrinth: Evaluating and Enhancing LLMs' Ability to Reason About Search Problems [59.72548591120689]
We introduce a new benchmark, SearchBench, containing 11 unique search problem types.
We show that even the most advanced LLMs fail to solve these problems end-to-end in text.
Instructing LLMs to generate code that solves the problem helps, but only slightly, e.g., GPT4's performance rises to 11.7%.
arXiv Detail & Related papers (2024-06-18T00:44:58Z) - Estimating Difficulty Levels of Programming Problems with Pre-trained Model [18.92661958433282]
The difficulty level of each programming problem serves as an essential reference for guiding students' adaptive learning.
We formulate the problem of automatic difficulty level estimation of each programming problem, given its textual description and a solution example of code.
For tackling this problem, we propose to couple two pre-trained models, one for text modality and the other for code modality, into a unified model.
arXiv Detail & Related papers (2024-06-13T05:38:20Z) - Distilling Algorithmic Reasoning from LLMs via Explaining Solution Programs [2.3020018305241337]
Distilling explicit chain-of-thought reasoning paths has emerged as an effective method for improving the reasoning abilities of large language models.
We propose a novel approach to distill reasoning abilities from LLMs by leveraging their capacity to explain solutions.
Our experiments demonstrate that learning from explanations enables the Reasoner to more effectively guide program implementation by a Coder.
arXiv Detail & Related papers (2024-04-11T22:19:50Z) - FAC$^2$E: Better Understanding Large Language Model Capabilities by Dissociating Language and Cognition [56.76951887823882]
Large language models (LLMs) are primarily evaluated by overall performance on various text understanding and generation tasks.
We present FAC$2$E, a framework for Fine-grAined and Cognition-grounded LLMs' Capability Evaluation.
arXiv Detail & Related papers (2024-02-29T21:05:37Z) - Do Language Models Exhibit the Same Cognitive Biases in Problem Solving as Human Learners? [140.9751389452011]
We study the biases of large language models (LLMs) in relation to those known in children when solving arithmetic word problems.
We generate a novel set of word problems for each of these tests, using a neuro-symbolic approach that enables fine-grained control over the problem features.
arXiv Detail & Related papers (2024-01-31T18:48:20Z) - Competition-Level Problems are Effective LLM Evaluators [121.15880285283116]
This paper aims to evaluate the reasoning capacities of large language models (LLMs) in solving recent programming problems in Codeforces.
We first provide a comprehensive evaluation of GPT-4's peiceived zero-shot performance on this task, considering various aspects such as problems' release time, difficulties, and types of errors encountered.
Surprisingly, theThoughtived performance of GPT-4 has experienced a cliff like decline in problems after September 2021 consistently across all the difficulties and types of problems.
arXiv Detail & Related papers (2023-12-04T18:58:57Z) - Automatically Correcting Large Language Models: Surveying the landscape
of diverse self-correction strategies [104.32199881187607]
Large language models (LLMs) have demonstrated remarkable performance across a wide array of NLP tasks.
A promising approach to rectify these flaws is self-correction, where the LLM itself is prompted or guided to fix problems in its own output.
This paper presents a comprehensive review of this emerging class of techniques.
arXiv Detail & Related papers (2023-08-06T18:38:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.