Beryllium: Neural Search for Algorithm Implementations
- URL: http://arxiv.org/abs/2305.15690v2
- Date: Sat, 1 Jul 2023 22:33:04 GMT
- Title: Beryllium: Neural Search for Algorithm Implementations
- Authors: Adithya Kulkarni, Mohna Chakraborty, Yonas Sium, Sai Charishma
Valluri, Wei Le, Qi Li
- Abstract summary: We design a new language named p-language to specify the algorithms and a static analyzer for the p-language to automatically extract information from the algorithm descriptions.
We embedded the output of p-language (p-code) and source code in a common vector space using self-supervised machine learning methods to match algorithm with code without any manual annotation.
Beryllium significantly outperformed the state-of-the-art code search tools in both C and Java.
- Score: 14.11934122454653
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we explore the feasibility of finding algorithm
implementations from code. Successfully matching code and algorithms can help
understand unknown code, provide reference implementations, and automatically
collect data for learning-based program synthesis. To achieve the goal, we
designed a new language named p-language to specify the algorithms and a static
analyzer for the p-language to automatically extract control flow, math, and
natural language information from the algorithm descriptions. We embedded the
output of p-language (p-code) and source code in a common vector space using
self-supervised machine learning methods to match algorithm with code without
any manual annotation. We developed a tool named Beryllium. It takes pseudo
code as a query and returns a list of ranked code snippets that likely match
the algorithm query. Our evaluation on Stony Brook Algorithm Repository and
popular GitHub projects show that Beryllium significantly outperformed the
state-of-the-art code search tools in both C and Java. Specifically, for 98.5%,
93.8%, and 66.2% queries, we found the algorithm implementations in the top 25,
10, and 1 ranked list, respectively. Given 87 algorithm queries, we found
implementations for 74 algorithms in the GitHub projects where we did not know
the algorithms before.
Related papers
- Active Learning of Mealy Machines with Timers [3.087487671441056]
We present the first algorithm for query learning of a class of Mealy machines with timers in a black-box context.
Our algorithm is an extension of the L# algorithm of Vaandrager et al. to a timed setting.
arXiv Detail & Related papers (2024-03-04T13:20:52Z) - Efficient Algorithms for Recognizing Weighted Tree-Adjoining Languages [104.90415092306219]
Four formalisms are equivalent to tree-adjoining grammars (TAG), linear indexed grammars (LIG), pushdown-adjoining automata (PAA) and embedded pushdown automata (EPDA)
We design new algorithms for computing their stringsum derivations (the weight of all automatons of a string) and allsums (the weight of all derivations)
For EPDA, our algorithm is both more space-efficient and time-efficient than the algorithm of Alonso et al. (2001) by factors of $mathcalO(|Gamma|2)$ and $
arXiv Detail & Related papers (2023-10-23T18:26:00Z) - ALGO: Synthesizing Algorithmic Programs with LLM-Generated Oracle
Verifiers [60.6418431624873]
Large language models (LLMs) excel at implementing code from functionality descriptions but struggle with algorithmic problems.
We propose ALGO, a framework that synthesizes Algorithmic programs with LLM-Generated Oracles to guide the generation and verify their correctness.
Experiments show that when equipped with ALGO, we achieve an 8x better one-submission pass rate over the Codex model and a 2.6x better one-submission pass rate over CodeT.
arXiv Detail & Related papers (2023-05-24T00:10:15Z) - imitation: Clean Imitation Learning Implementations [7.7064239657103375]
imitation provides open-source implementations of imitation and reward learning algorithms in PyTorch.
We include three inverse reinforcement learning (IRL) algorithms, three imitation learning algorithms and a preference comparison.
arXiv Detail & Related papers (2022-11-22T03:11:29Z) - Algorithms for Weighted Pushdown Automata [118.67634716230025]
Weighted pushdown automata (WPDAs) are at the core of many natural language processing tasks.
We develop novel algorithms that operate directly on WPDAs.
arXiv Detail & Related papers (2022-10-13T10:21:31Z) - Interactive Code Generation via Test-Driven User-Intent Formalization [60.90035204567797]
Large language models (LLMs) produce code from informal natural language (NL) intent.
It is hard to define a notion of correctness since natural language can be ambiguous and lacks a formal semantics.
We describe a language-agnostic abstract algorithm and a concrete implementation TiCoder.
arXiv Detail & Related papers (2022-08-11T17:41:08Z) - The CLRS Algorithmic Reasoning Benchmark [28.789225199559834]
Learning representations of algorithms is an emerging area of machine learning, seeking to bridge concepts from neural networks with classical algorithms.
We propose the CLRS Algorithmic Reasoning Benchmark, covering classical algorithms from the Introduction to Algorithms textbook.
Our benchmark spans a variety of algorithmic reasoning procedures, including sorting, searching, dynamic programming, graph algorithms, string algorithms and geometric algorithms.
arXiv Detail & Related papers (2022-05-31T09:56:44Z) - An Approach for Automatic Construction of an Algorithmic Knowledge Graph
from Textual Resources [3.723553383515688]
We introduce an approach for automatically developing a knowledge graph for algorithmic problems from unstructured data.
An algorithm KG will give additional context and explainability to the algorithm metadata.
arXiv Detail & Related papers (2022-05-13T18:59:23Z) - A Metaheuristic Algorithm for Large Maximum Weight Independent Set
Problems [58.348679046591265]
Given a node-weighted graph, find a set of independent (mutually nonadjacent) nodes whose node-weight sum is maximum.
Some of the graphs airsing in this application are large, having hundreds of thousands of nodes and hundreds of millions of edges.
We develop a new local search algorithm, which is a metaheuristic in the greedy randomized adaptive search framework.
arXiv Detail & Related papers (2022-03-28T21:34:16Z) - Towards Optimally Efficient Tree Search with Deep Learning [76.64632985696237]
This paper investigates the classical integer least-squares problem which estimates signals integer from linear models.
The problem is NP-hard and often arises in diverse applications such as signal processing, bioinformatics, communications and machine learning.
We propose a general hyper-accelerated tree search (HATS) algorithm by employing a deep neural network to estimate the optimal estimation for the underlying simplified memory-bounded A* algorithm.
arXiv Detail & Related papers (2021-01-07T08:00:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.