Programming Puzzles
- URL: http://arxiv.org/abs/2106.05784v1
- Date: Thu, 10 Jun 2021 14:37:28 GMT
- Title: Programming Puzzles
- Authors: Tal Schuster, Ashwin Kalyan, Oleksandr Polozov, Adam Tauman Kalai
- Abstract summary: We release an open-source dataset of Python Programming Puzzles (P3)
The puzzles are objective in that each one is specified entirely by the source code of its verifier $f$.
They do not require an answer key or input/output examples, nor do they depend on natural language understanding.
- Score: 31.797853936252594
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a new type of programming challenge called programming puzzles,
as an objective and comprehensive evaluation of program synthesis, and release
an open-source dataset of Python Programming Puzzles (P3). Each puzzle is
defined by a short Python program $f$, and the goal is to find an input $x$
which makes $f$ output "True". The puzzles are objective in that each one is
specified entirely by the source code of its verifier $f$, so evaluating $f(x)$
is all that is needed to test a candidate solution $x$. They do not require an
answer key or input/output examples, nor do they depend on natural language
understanding. The dataset is comprehensive in that it spans problems of a
range of difficulties and domains, ranging from trivial string manipulation
problems that are immediately obvious to human programmers (but not necessarily
to AI), to classic programming puzzles (e.g., Towers of Hanoi), to
interview/competitive-programming problems (e.g., dynamic programming), to
longstanding open problems in algorithms and mathematics (e.g., factoring). The
objective nature of P3 readily supports self-supervised bootstrapping. We
develop baseline enumerative program synthesis and GPT-3 solvers that are
capable of solving easy puzzles -- even without access to any reference
solutions -- by learning from their own past solutions. Based on a small user
study, we find puzzle difficulty to correlate between human programmers and the
baseline AI solvers.
Related papers
- Can Language Models Solve Olympiad Programming? [40.54366634332231]
This paper introduces the USACO benchmark with 307 problems from the USA Computing Olympiad.
We construct and test a range of LM inference methods for competitive programming for the first time.
We find GPT-4 only achieves a 8.7% pass@1 accuracy with zero-shot chain-of-thought prompting.
arXiv Detail & Related papers (2024-04-16T23:27:38Z) - Are Language Models Puzzle Prodigies? Algorithmic Puzzles Unveil Serious
Challenges in Multimodal Reasoning [24.386388107656334]
This paper introduces the novel task of multimodal puzzle solving, framed within the context of visual question-answering.
We present a new dataset, AlgoVQA, designed to challenge and evaluate the capabilities of multimodal language models in solving algorithmic puzzles.
arXiv Detail & Related papers (2024-03-06T17:15:04Z) - PuzzleBench: Can LLMs Solve Challenging First-Order Combinatorial
Reasoning Problems? [27.696027301600793]
We present PuzzleBench, a dataset of 31 such challenging problems along with a few solved instances for each problem.
These problems are all first order, i.e., they can be instantiated with problem instances of varying sizes, and most of them are NP-hard.
We first observe that LLMs, even when aided by symbolic solvers, perform rather poorly on our dataset.
In response, we propose a new approach, Puzzle-LM, which combines LLMs with both symbolic solvers and interpreter.
arXiv Detail & Related papers (2024-02-04T20:56:09Z) - Do Language Models Exhibit the Same Cognitive Biases in Problem Solving as Human Learners? [140.9751389452011]
We study the biases of large language models (LLMs) in relation to those known in children when solving arithmetic word problems.
We generate a novel set of word problems for each of these tests, using a neuro-symbolic approach that enables fine-grained control over the problem features.
arXiv Detail & Related papers (2024-01-31T18:48:20Z) - Can Language Models Solve Graph Problems in Natural Language? [51.28850846990929]
Large language models (LLMs) are increasingly adopted for a variety of tasks with implicit graphical structures.
We propose NLGraph, a benchmark of graph-based problem solving simulating in natural language.
arXiv Detail & Related papers (2023-05-17T08:29:21Z) - PAL: Program-aided Language Models [112.94785609781503]
We present Program-Aided Language models (PaL) to understand natural language problems.
PaL offloads the solution step to a programmatic runtime such as a Python interpreter.
We set new state-of-the-art results in all 12 benchmarks.
arXiv Detail & Related papers (2022-11-18T18:56:13Z) - Competition-Level Code Generation with AlphaCode [74.87216298566942]
We introduce AlphaCode, a system for code generation that can create novel solutions to problems that require deeper reasoning.
In simulated evaluations on recent programming competitions on the Codeforces platform, AlphaCode achieved on average a ranking of top 54.3%.
arXiv Detail & Related papers (2022-02-08T23:16:31Z) - On Theoretical Complexity and Boolean Satisfiability [0.0]
This thesis introduces some of the most central concepts in the Theory of Computing.
We then explore some of its tractable as well as intractable variants such as Horn-SAT and 3-SAT.
Finally, we establish reductions from 3-SAT to some of the famous NP-complete graph problems.
arXiv Detail & Related papers (2021-12-22T10:13:34Z) - Solving Linear Algebra by Program Synthesis [1.0660480034605238]
We solve MIT's Linear Algebra 18.06 course and Columbia University's Computational Linear Algebra COMS3251 courses with perfect accuracy by interactive program synthesis.
This surprisingly strong result is achieved by turning the course questions into programming tasks and then running the programs to produce the correct answers.
arXiv Detail & Related papers (2021-11-16T01:16:43Z) - PuzzLing Machines: A Challenge on Learning From Small Data [64.513459448362]
We introduce a challenge on learning from small data, PuzzLing Machines, which consists of Rosetta Stone puzzles from Linguistic Olympiads for high school students.
Our challenge contains around 100 puzzles covering a wide range of linguistic phenomena from 81 languages.
We show that both simple statistical algorithms and state-of-the-art deep neural models perform inadequately on this challenge, as expected.
arXiv Detail & Related papers (2020-04-27T20:34:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.