Parsing Fortran-77 with proprietary extensions
- URL: http://arxiv.org/abs/2309.02019v1
- Date: Tue, 5 Sep 2023 07:54:02 GMT
- Title: Parsing Fortran-77 with proprietary extensions
- Authors: Younoussa Sow, Larisa Safina, L\'eandre Brault, Papa Ibou Diouf,
St\'ephane Ducasse, Nicolas Anquetil
- Abstract summary: Many organizations still rely on old code written in "obsolete" programming languages.
One difficulty of working with these "veteran languages" is being able to parse the source code to build a representation of it.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Far from the latest innovations in software development, many organizations
still rely on old code written in "obsolete" programming languages. Because
this source code is old and proven it often contributes significantly to the
continuing success of these organizations. Yet to keep the applications
relevant and running in an evolving environment, they sometimes need to be
updated or migrated to new languages or new platforms. One difficulty of
working with these "veteran languages" is being able to parse the source code
to build a representation of it. Parsing can also allow modern software
development tools and IDEs to offer better support to these veteran languages.
We initiated a project between our group and the Framatome company to help
migrate old Fortran-77 with proprietary extensions (called Esope) into more
modern Fortran. In this paper, we explain how we parsed the Esope language with
a combination of island grammar and regular parser to build an abstract syntax
tree of the code.
Related papers
- StmtTree: An Easy-to-Use yet Versatile Fortran Transformation Toolkit [7.477012296839271]
We present StmtTree, a new Fortran code transformation toolkit to address this issue.
StmtTree abstracts the Fortran grammar into statement tree, offering both a low-level representation manipulation API and a high-level, easy-to-use query and manipulation mini-language.
Experiments show that StmtTree adapts well to legacy Fortran-77 codes, and complex tools such as removing unused statements can be developed with fewer than 100 lines of python code.
arXiv Detail & Related papers (2024-07-08T06:23:13Z) - Transforming C++11 Code to C++03 to Support Legacy Compilation Environments [1.6851123188451185]
We create a source code transformation framework to automatically backport code written according to the C++11 standard to its functionally equivalent C++03 variant.
This paper reports on the technical details of the transformation engine, and our experiences in applying it on two large industrial code bases and four open-source systems.
arXiv Detail & Related papers (2024-05-12T08:02:21Z) - CodeGRAG: Bridging the Gap between Natural Language and Programming Language via Graphical Retrieval Augmented Generation [58.84212778960507]
We propose CodeGRAG, a Graphical Retrieval Augmented Code Generation framework to enhance the performance of LLMs.
CodeGRAG builds the graphical view of code blocks based on the control flow and data flow of them to fill the gap between programming languages and natural language.
Various experiments and ablations are done on four datasets including both the C++ and python languages to validate the hard meta-graph prompt, the soft prompting technique, and the effectiveness of the objectives for pretrained GNN expert.
arXiv Detail & Related papers (2024-05-03T02:48:55Z) - IRCoder: Intermediate Representations Make Language Models Robust Multilingual Code Generators [49.903001442804594]
This work investigates the prospect of leveraging compiler intermediate representations (IR) to improve the multilingual capabilities of Code-LMs.
We first compile SLTrans, a parallel dataset consisting of nearly 4M self-contained source code files.
Next, we carry out continued causal language modelling training on SLTrans, forcing the Code-LMs to learn the IR language.
Our resulting models, dubbed IRCoder, display sizeable and consistent gains across a wide variety of code generation tasks and metrics.
arXiv Detail & Related papers (2024-03-06T17:52:08Z) - ChatDev: Communicative Agents for Software Development [84.90400377131962]
ChatDev is a chat-powered software development framework in which specialized agents are guided in what to communicate.
These agents actively contribute to the design, coding, and testing phases through unified language-based communication.
arXiv Detail & Related papers (2023-07-16T02:11:34Z) - COMEX: A Tool for Generating Customized Source Code Representations [7.151800146054561]
COMEX is a framework that allows researchers and developers to create and combine multiple code-views.
It can analyze both method-level snippets and program-level snippets by using both intra-procedural and inter-procedural snippets.
It is built on tree-sitter - a widely used incremental analysis tool that supports over 40 languages.
arXiv Detail & Related papers (2023-07-10T16:46:34Z) - CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Benchmarking on HumanEval-X [50.008474888951525]
We introduce CodeGeeX, a multilingual model with 13 billion parameters for code generation.
CodeGeeX is pre-trained on 850 billion tokens of 23 programming languages.
arXiv Detail & Related papers (2023-03-30T17:34:01Z) - MCoNaLa: A Benchmark for Code Generation from Multiple Natural Languages [76.93265104421559]
We benchmark code generation from natural language commands extending beyond English.
We annotated a total of 896 NL-code pairs in three languages: Spanish, Japanese, and Russian.
While the difficulties vary across these three languages, all systems lag significantly behind their English counterparts.
arXiv Detail & Related papers (2022-03-16T04:21:50Z) - Toward Modern Fortran Tooling and a Thriving Developer Community [0.0]
Fortran is the oldest high-level programming language that remains in use today.
It is one of the dominant languages used for compute-intensive scientific and engineering applications.
In this paper we report on the progress to date and outline the next steps.
arXiv Detail & Related papers (2021-09-15T15:43:06Z) - On the Evolution of Programming Languages [0.0]
It tries to give supportive evidence that the new languages are more robust than the previous.
An analysis of most prominent programming languages is presented, emphasizing on how the features of existing languages have influenced the development of new programming languages.
At the end, it suggests a set of experimental languages, which may rule the world of programming languages in the time of new multi-core architectures.
arXiv Detail & Related papers (2020-06-27T10:18:14Z) - Incorporating External Knowledge through Pre-training for Natural
Language to Code Generation [97.97049697457425]
Open-domain code generation aims to generate code in a general-purpose programming language from natural language (NL) intents.
We explore the effectiveness of incorporating two varieties of external knowledge into NL-to-code generation: automatically mined NL-code pairs from the online programming QA forum StackOverflow and programming language API documentation.
Our evaluations show that combining the two sources with data augmentation and retrieval-based data re-sampling improves the current state-of-the-art by up to 2.2% absolute BLEU score on the code generation testbed CoNaLa.
arXiv Detail & Related papers (2020-04-20T01:45:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.