Neural Transition-based Parsing of Library Deprecations
- URL: http://arxiv.org/abs/2212.12584v1
- Date: Fri, 23 Dec 2022 20:48:33 GMT
- Title: Neural Transition-based Parsing of Library Deprecations
- Authors: Petr Babkin, Nacho Navarro, Salwa Alamir, Sameena Shah
- Abstract summary: This paper tackles the problem of automating code updates to fix deprecated API usages of open source libraries by analyzing their release notes.
Our system employs a three-tier architecture: first, a web crawler service retrieves deprecation documentation from the web; then a specially built text processes those documents into tree-structured representations.
To confirm the effectiveness of our method, we gathered and labeled a set of 426 API deprecations from 7 well-known Python data science libraries, and demonstrated our approach decisively outperforms a non-trivial neural machine translation baseline.
- Score: 3.6382354548339295
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper tackles the challenging problem of automating code updates to fix
deprecated API usages of open source libraries by analyzing their release
notes. Our system employs a three-tier architecture: first, a web crawler
service retrieves deprecation documentation from the web; then a specially
built parser processes those text documents into tree-structured
representations; finally, a client IDE plugin locates and fixes identified
deprecated usages of libraries in a given codebase. The focus of this paper in
particular is the parsing component. We introduce a novel transition-based
parser in two variants: based on a classical feature engineered classifier and
a neural tree encoder. To confirm the effectiveness of our method, we gathered
and labeled a set of 426 API deprecations from 7 well-known Python data science
libraries, and demonstrated our approach decisively outperforms a non-trivial
neural machine translation baseline.
Related papers
- Contextual Document Embeddings [77.22328616983417]
We propose two complementary methods for contextualized document embeddings.
First, an alternative contrastive learning objective that explicitly incorporates the document neighbors into the intra-batch contextual loss.
Second, a new contextual architecture that explicitly encodes neighbor document information into the encoded representation.
arXiv Detail & Related papers (2024-10-03T14:33:34Z) - DocCGen: Document-based Controlled Code Generation [33.19206322891497]
DocCGen is a framework that can leverage rich knowledge by breaking the NL-to-Code generation task for structured code languages into a two-step process.
Our experiments show that DocCGen consistently improves different-sized language models across all six evaluation metrics.
arXiv Detail & Related papers (2024-06-17T08:34:57Z) - Lightweight Syntactic API Usage Analysis with UCov [0.0]
We present a novel conceptual framework designed to assist library maintainers in understanding the interactions allowed by their APIs.
These customizable models enable library maintainers to improve their design ahead of release, reducing friction during evolution.
We implement these models for Java libraries in a new tool UCov and demonstrate its capabilities on three libraries exhibiting diverse styles of interaction.
arXiv Detail & Related papers (2024-02-19T10:33:41Z) - Neural Models for Source Code Synthesis and Completion [0.0]
Natural language (NL) to code suggestion systems assist developers in Integrated Development Environments (IDEs) by translating NL utterances into compilable code snippet.
Current approaches mainly involve hard-coded, rule-based systems based on semantic parsing.
We present sequence-to-sequence deep learning models and training paradigms to map NL to general-purpose programming languages.
arXiv Detail & Related papers (2024-02-08T17:10:12Z) - LILO: Learning Interpretable Libraries by Compressing and Documenting Code [71.55208585024198]
We introduce LILO, a neurosymbolic framework that iteratively synthesizes, compresses, and documents code.
LILO combines LLM-guided program synthesis with recent algorithmic advances in automated from Stitch.
We find that AutoDoc boosts performance by helping LILO's synthesizer to interpret and deploy learned abstractions.
arXiv Detail & Related papers (2023-10-30T17:55:02Z) - Private-Library-Oriented Code Generation with Large Language Models [52.73999698194344]
This paper focuses on utilizing large language models (LLMs) for code generation in private libraries.
We propose a novel framework that emulates the process of programmers writing private code.
We create four private library benchmarks, including TorchDataEval, TorchDataComplexEval, MonkeyEval, and BeatNumEval.
arXiv Detail & Related papers (2023-07-28T07:43:13Z) - Structured Dialogue Discourse Parsing [79.37200787463917]
discourse parsing aims to uncover the internal structure of a multi-participant conversation.
We propose a principled method that improves upon previous work from two perspectives: encoding and decoding.
Experiments show that our method achieves new state-of-the-art, surpassing the previous model by 2.3 on STAC and 1.5 on Molweni.
arXiv Detail & Related papers (2023-06-26T22:51:01Z) - Generate rather than Retrieve: Large Language Models are Strong Context
Generators [74.87021992611672]
We present a novel perspective for solving knowledge-intensive tasks by replacing document retrievers with large language model generators.
We call our method generate-then-read (GenRead), which first prompts a large language model to generate contextutal documents based on a given question, and then reads the generated documents to produce the final answer.
arXiv Detail & Related papers (2022-09-21T01:30:59Z) - Evaluating the Impact of Source Code Parsers on ML4SE Models [3.699097874146491]
We evaluate two models, namely, Supernorm2Seq and TreeLSTM, in the name prediction language.
We show that trees built by differents vary in their structure and content.
We then analyze how this diversity affects the models' quality.
arXiv Detail & Related papers (2022-06-17T12:10:04Z) - Autoregressive Search Engines: Generating Substrings as Document
Identifiers [53.0729058170278]
Autoregressive language models are emerging as the de-facto standard for generating answers.
Previous work has explored ways to partition the search space into hierarchical structures.
In this work we propose an alternative that doesn't force any structure in the search space: using all ngrams in a passage as its possible identifiers.
arXiv Detail & Related papers (2022-04-22T10:45:01Z) - fastai: A Layered API for Deep Learning [1.7223564681760164]
fastai is a deep learning library which provides practitioners with high-level components.
It provides researchers with low-level components that can be mixed and matched to build new approaches.
arXiv Detail & Related papers (2020-02-11T21:16:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.