Towards Top-Down Automated Development in Limited Scopes: A
Neuro-Symbolic Framework from Expressibles to Executables
- URL: http://arxiv.org/abs/2209.01566v4
- Date: Thu, 24 Aug 2023 08:16:38 GMT
- Title: Towards Top-Down Automated Development in Limited Scopes: A
Neuro-Symbolic Framework from Expressibles to Executables
- Authors: Jian Gu, Harald C. Gall
- Abstract summary: We build a taxonomy on code data, namely code taxonomy, leveraging the categorization of code information.
We introduce a three-layer semantic pyramid (SP) to associate text data and code data.
We propose a semantic pyramid framework (SPF) as the approach, focusing on software of high modularity and low complexity.
- Score: 4.844958528198992
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep code generation is a topic of deep learning for software engineering
(DL4SE), which adopts neural models to generate code for the intended
functions. Since end-to-end neural methods lack domain knowledge and software
hierarchy awareness, they tend to perform poorly w.r.t project-level tasks. To
systematically explore the potential improvements of code generation, we let it
participate in the whole top-down development from \emph{expressibles} to
\emph{executables}, which is possible in limited scopes. In the process, it
benefits from massive samples, features, and knowledge. As the foundation, we
suggest building a taxonomy on code data, namely code taxonomy, leveraging the
categorization of code information. Moreover, we introduce a three-layer
semantic pyramid (SP) to associate text data and code data. It identifies the
information of different abstraction levels, and thus introduces the domain
knowledge on development and reveals the hierarchy of software. Furthermore, we
propose a semantic pyramid framework (SPF) as the approach, focusing on
software of high modularity and low complexity. SPF divides the code generation
process into stages and reserves spots for potential interactions. In addition,
we conceived preliminary applications in software development to confirm the
neuro-symbolic framework.
Related papers
- OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models [70.72097493954067]
Large language models (LLMs) for code have become indispensable in various domains, including code generation, reasoning tasks and agent systems.
While open-access code LLMs are increasingly approaching the performance levels of proprietary models, high-quality code LLMs remain limited.
We introduce OpenCoder, a top-tier code LLM that not only achieves performance comparable to leading models but also serves as an "open cookbook" for the research community.
arXiv Detail & Related papers (2024-11-07T17:47:25Z) - CodeGRAG: Bridging the Gap between Natural Language and Programming Language via Graphical Retrieval Augmented Generation [58.84212778960507]
We propose CodeGRAG, a Graphical Retrieval Augmented Code Generation framework to enhance the performance of LLMs.
CodeGRAG builds the graphical view of code blocks based on the control flow and data flow of them to fill the gap between programming languages and natural language.
Various experiments and ablations are done on four datasets including both the C++ and python languages to validate the hard meta-graph prompt, the soft prompting technique, and the effectiveness of the objectives for pretrained GNN expert.
arXiv Detail & Related papers (2024-05-03T02:48:55Z) - Deep Learning for Code Intelligence: Survey, Benchmark and Toolkit [63.82016263181941]
Code intelligence leverages machine learning techniques to extract knowledge from extensive code corpora.
Currently, there is already a thriving research community focusing on code intelligence.
arXiv Detail & Related papers (2023-12-30T17:48:37Z) - TransformCode: A Contrastive Learning Framework for Code Embedding via Subtree Transformation [9.477734501499274]
We present TransformCode, a novel framework that learns code embeddings in a contrastive learning manner.
Our framework is encoder-agnostic and language-agnostic, which means that it can leverage any encoder model and handle any programming language.
arXiv Detail & Related papers (2023-11-10T09:05:23Z) - When Do Program-of-Thoughts Work for Reasoning? [51.2699797837818]
We propose complexity-impacted reasoning score (CIRS) to measure correlation between code and reasoning abilities.
Specifically, we use the abstract syntax tree to encode the structural information and calculate logical complexity.
Code will be integrated into the EasyInstruct framework at https://github.com/zjunlp/EasyInstruct.
arXiv Detail & Related papers (2023-08-29T17:22:39Z) - Adding Context to Source Code Representations for Deep Learning [13.676416860721877]
We argue that it is beneficial for deep learning models to have access to additional contextual information about the code being analysed.
We present preliminary evidence that encoding context from the call hierarchy along with information from the code itself can improve the performance of a state-of-the-art deep learning model.
arXiv Detail & Related papers (2022-07-30T12:47:32Z) - A Survey of Deep Learning Models for Structural Code Understanding [21.66270320648155]
We present a comprehensive overview of the structures formed from code data.
We categorize the models for understanding code in recent years into two groups: sequence-based and graph-based models.
We also introduce metrics, datasets and the downstream tasks.
arXiv Detail & Related papers (2022-05-03T03:56:17Z) - Contrastive Learning for Source Code with Structural and Functional
Properties [66.10710134948478]
We present BOOST, a novel self-supervised model to focus pre-training based on the characteristics of source code.
We employ automated, structure-guided code transformation algorithms that generate functionally equivalent code that looks drastically different from the original one.
We train our model in a way that brings the functionally equivalent code closer and distinct code further through a contrastive learning objective.
arXiv Detail & Related papers (2021-10-08T02:56:43Z) - Unsupervised Learning of Neurosymbolic Encoders [40.3575054882791]
We present a framework for the unsupervised learning of neurosymbolic encoders, i.e., encoders obtained by composing neural networks with symbolic programs from a domain-specific language.
Such a framework can naturally incorporate symbolic expert knowledge into the learning process and lead to more interpretable and factorized latent representations than fully neural encoders.
arXiv Detail & Related papers (2021-07-28T02:16:14Z) - COSEA: Convolutional Code Search with Layer-wise Attention [90.35777733464354]
We propose a new deep learning architecture, COSEA, which leverages convolutional neural networks with layer-wise attention to capture the code's intrinsic structural logic.
COSEA can achieve significant improvements over state-of-the-art methods on code search tasks.
arXiv Detail & Related papers (2020-10-19T13:53:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.