Toward Neurosymbolic Program Comprehension
- URL: http://arxiv.org/abs/2502.01806v1
- Date: Mon, 03 Feb 2025 20:38:58 GMT
- Title: Toward Neurosymbolic Program Comprehension
- Authors: Alejandro Velasco, Aya Garryyeva, David N. Palacio, Antonio Mastropaolo, Denys Poshyvanyk,
- Abstract summary: We advocate for a Neurosymbolic research direction that combines the strengths of existing DL techniques with traditional symbolic methods.
We present preliminary results for our envisioned approach, aimed at establishing the first Neurosymbolic Program framework.
- Score: 46.874490406174644
- License:
- Abstract: Recent advancements in Large Language Models (LLMs) have paved the way for Large Code Models (LCMs), enabling automation in complex software engineering tasks, such as code generation, software testing, and program comprehension, among others. Tools like GitHub Copilot and ChatGPT have shown substantial benefits in supporting developers across various practices. However, the ambition to scale these models to trillion-parameter sizes, exemplified by GPT-4, poses significant challenges that limit the usage of Artificial Intelligence (AI)-based systems powered by large Deep Learning (DL) models. These include rising computational demands for training and deployment and issues related to trustworthiness, bias, and interpretability. Such factors can make managing these models impractical for many organizations, while their "black-box'' nature undermines key aspects, including transparency and accountability. In this paper, we question the prevailing assumption that increasing model parameters is always the optimal path forward, provided there is sufficient new data to learn additional patterns. In particular, we advocate for a Neurosymbolic research direction that combines the strengths of existing DL techniques (e.g., LLMs) with traditional symbolic methods--renowned for their reliability, speed, and determinism. To this end, we outline the core features and present preliminary results for our envisioned approach, aimed at establishing the first Neurosymbolic Program Comprehension (NsPC) framework to aid in identifying defective code components.
Related papers
- SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models [54.78329741186446]
We propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation.
Experiments across both in-domain and out-of-domain benchmarks in English and Chinese demonstrate the effectiveness of the proposed paradigm.
arXiv Detail & Related papers (2024-08-28T06:33:03Z) - Coding for Intelligence from the Perspective of Category [66.14012258680992]
Coding targets compressing and reconstructing data, and intelligence.
Recent trends demonstrate the potential homogeneity of these two fields.
We propose a novel problem of Coding for Intelligence from the category theory view.
arXiv Detail & Related papers (2024-07-01T07:05:44Z) - The Role of Foundation Models in Neuro-Symbolic Learning and Reasoning [54.56905063752427]
Neuro-Symbolic AI (NeSy) holds promise to ensure the safe deployment of AI systems.
Existing pipelines that train the neural and symbolic components sequentially require extensive labelling.
New architecture, NeSyGPT, fine-tunes a vision-language foundation model to extract symbolic features from raw data.
arXiv Detail & Related papers (2024-02-02T20:33:14Z) - A foundational neural operator that continuously learns without
forgetting [1.0878040851638]
We introduce the concept of the Neural Combinatorial Wavelet Neural Operator (NCWNO) as a foundational model for scientific computing.
The NCWNO is specifically designed to excel in learning from a diverse spectrum of physics and continuously adapt to the solution operators associated with parametric partial differential equations (PDEs)
The proposed foundational model offers two key advantages: (i) it can simultaneously learn solution operators for multiple parametric PDEs, and (ii) it can swiftly generalize to new parametric PDEs with minimal fine-tuning.
arXiv Detail & Related papers (2023-10-29T03:20:10Z) - CodeRL: Mastering Code Generation through Pretrained Models and Deep
Reinforcement Learning [92.36705236706678]
"CodeRL" is a new framework for program synthesis tasks through pretrained LMs and deep reinforcement learning.
During inference, we introduce a new generation procedure with a critical sampling strategy.
For the model backbones, we extended the encoder-decoder architecture of CodeT5 with enhanced learning objectives.
arXiv Detail & Related papers (2022-07-05T02:42:15Z) - Data-Driven and SE-assisted AI Model Signal-Awareness Enhancement and
Introspection [61.571331422347875]
We propose a data-driven approach to enhance models' signal-awareness.
We combine the SE concept of code complexity with the AI technique of curriculum learning.
We achieve up to 4.8x improvement in model signal awareness.
arXiv Detail & Related papers (2021-11-10T17:58:18Z) - Understanding Neural Code Intelligence Through Program Simplification [3.9704927572880253]
We propose a model-agnostic approach to identify critical input features for models in code intelligence systems.
Our approach, SIVAND, uses simplification techniques that reduce the size of input programs of a CI model.
We believe that SIVAND's extracted features may help understand neural CI systems' predictions and learned behavior.
arXiv Detail & Related papers (2021-06-07T05:44:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.