Related papers: Towards Neural Functional Program Evaluation

Towards Neural Functional Program Evaluation

URL: http://arxiv.org/abs/2112.04630v1
Date: Thu, 9 Dec 2021 00:20:29 GMT
Title: Towards Neural Functional Program Evaluation
Authors: Torsten Scholak and Jonathan Pilault and Joey Velez-Ginorio
Abstract summary: We introduce a new program generation mechanism that allows control over syntactic sugar for semantically equivalent programs. Experiments reveal that neural functional program evaluation performs surprisingly well, achieving high 90% exact program match scores.
Score: 0.5586191108738562
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: This paper explores the capabilities of current transformer-based language models for program evaluation of simple functional programming languages. We introduce a new program generation mechanism that allows control over syntactic sugar for semantically equivalent programs. T5 experiments reveal that neural functional program evaluation performs surprisingly well, achieving high 90% exact program match scores for most in-distribution and out-of-distribution tests. Using pretrained T5 weights has significant advantages over random initialization. We present and evaluate on three datasets to study generalization abilities that are specific to functional programs based on: type, function composition, and reduction steps. Code and data are publicly available at https://github.com/ElementAI/neural-interpreters.

Related papers

EquiBench: Benchmarking Code Reasoning Capabilities of Large Language Models via Equivalence Checking [54.354203142828084]
We present the task of equivalence checking as a new way to evaluate the code reasoning abilities of large language models. We introduce EquiBench, a dataset of 2400 program pairs spanning four programming languages and six equivalence categories. Our evaluation of 17 state-of-the-art LLMs shows that OpenAI o3-mini achieves the highest overall accuracy of 78.0%.
arXiv Detail & Related papers (2025-02-18T02:54:25Z)
DiSciPLE: Learning Interpretable Programs for Scientific Visual Discovery [61.02102713094486]
Good interpretation is important in scientific reasoning, as it allows for better decision-making. This paper introduces an automatic way of obtaining such interpretable-by-design models, by learning programs that interleave neural networks. We propose DiSciPLE an evolutionary algorithm that leverages common sense and prior knowledge of large language models (LLMs) to create Python programs explaining visual data.
arXiv Detail & Related papers (2025-02-14T10:26:14Z)
ReasonAgain: Using Extractable Symbolic Programs to Evaluate Mathematical Reasoning [54.70811660561151]
Existing math datasets evaluate the reasoning abilities of large language models (LLMs) by either using the final answer or the intermediate reasoning steps derived from static examples. We seek to use symbolic programs as a means for automated evaluation if a model can consistently produce correct final answers across various inputs to the program. We observe significant accuracy drops using our proposed evaluation compared with original static examples, suggesting the fragility of math reasoning in state-of-the-art LLMs.
arXiv Detail & Related papers (2024-10-24T18:02:37Z)
Data Augmentation by Fuzzing for Neural Test Generation [7.310817657037053]
We introduce a novel data augmentation technique, *FuzzAug*, that introduces the benefits of fuzzing to large language models. Our evaluations show that models trained with dataset augmented by FuzzAug increase assertion accuracy by 5%, improve compilation rate by more than 10%, and generate unit test functions with 5% more branch coverage.
arXiv Detail & Related papers (2024-06-12T22:09:27Z)
FIND: A Function Description Benchmark for Evaluating Interpretability Methods [86.80718559904854]
This paper introduces FIND (Function INterpretation and Description), a benchmark suite for evaluating automated interpretability methods. FIND contains functions that resemble components of trained neural networks, and accompanying descriptions of the kind we seek to generate. We evaluate methods that use pretrained language models to produce descriptions of function behavior in natural language and code.
arXiv Detail & Related papers (2023-09-07T17:47:26Z)
Understanding Programs by Exploiting (Fuzzing) Test Cases [26.8259045248779]
We propose to incorporate the relationship between inputs and possible outputs/behaviors into learning, for achieving a deeper semantic understanding of programs. To obtain inputs that are representative enough to trigger the execution of most part of the code, we resort to fuzz testing and propose fuzz tuning. The effectiveness of the proposed method is verified on two program understanding tasks including code clone detection and code classification, and it outperforms current state-of-the-arts by large margins.
arXiv Detail & Related papers (2023-05-23T01:51:46Z)
CodeGen2: Lessons for Training LLMs on Programming and Natural Languages [116.74407069443895]
We unify encoder and decoder-based models into a single prefix-LM. For learning methods, we explore the claim of a "free lunch" hypothesis. For data distributions, the effect of a mixture distribution and multi-epoch training of programming and natural languages on model performance is explored.
arXiv Detail & Related papers (2023-05-03T17:55:25Z)
Learning from Self-Sampled Correct and Partially-Correct Programs [96.66452896657991]
We propose to let the model perform sampling during training and learn from both self-sampled fully-correct programs and partially-correct programs. We show that our use of self-sampled correct and partially-correct programs can benefit learning and help guide the sampling process. Our proposed method improves the pass@k performance by 3.1% to 12.3% compared to learning from a single reference program with MLE.
arXiv Detail & Related papers (2022-05-28T03:31:07Z)
Foundation Posteriors for Approximate Probabilistic Inference [11.64841553345271]
We formulate inference as masked language modeling in a probabilistic program. We train a neural network to unmask the random values, defining an approximate posterior distribution. We show the efficacy of the approach, zero-shot and fine-tuned, on a benchmark of STAN programs.
arXiv Detail & Related papers (2022-05-19T17:42:37Z)
Neural Termination Analysis [9.973499664140158]
We train neural networks to act as ranking functions. We map program states to values that are bounded from below and decrease as the program runs. The existence of a valid ranking function proves that the program terminates.
arXiv Detail & Related papers (2021-02-07T15:45:30Z)
Representing Partial Programs with Blended Abstract Semantics [62.20775388513027]
We introduce a technique for representing partially written programs in a program synthesis engine. We learn an approximate execution model implemented as a modular neural network. We show that these hybrid neuro-symbolic representations enable execution-guided synthesizers to use more powerful language constructs.
arXiv Detail & Related papers (2020-12-23T20:40:18Z)
On the Generalizability of Neural Program Models with respect to Semantic-Preserving Program Transformations [25.96895574298886]
We evaluate the generalizability of neural program models with respect to semantic-preserving transformations. We use three Java datasets of different sizes and three state-of-the-art neural network models for code. Our results suggest that neural program models based on data and control dependencies in programs generalize better than neural program models based only on abstract syntax trees.
arXiv Detail & Related papers (2020-07-31T20:39:20Z)
Strong Generalization and Efficiency in Neural Programs [69.18742158883869]
We study the problem of learning efficient algorithms that strongly generalize in the framework of neural program induction. By carefully designing the input / output interfaces of the neural model and through imitation, we are able to learn models that produce correct results for arbitrary input sizes.
arXiv Detail & Related papers (2020-07-07T17:03:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.