Related papers: Pointing to Subwords for Generating Function Names in Source Code

Pointing to Subwords for Generating Function Names in Source Code

URL: http://arxiv.org/abs/2011.04241v1
Date: Mon, 9 Nov 2020 08:17:17 GMT
Title: Pointing to Subwords for Generating Function Names in Source Code
Authors: Shogo Fujita, Hidetaka Kamigaito, Hiroya Takamura, Manabu Okumura
Abstract summary: We propose two strategies for copying low-frequency or out-of-vocabulary subwords in inputs. Our best performing model showed an improvement over the conventional method in terms of our modified F1.
Score: 43.36314933559263
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We tackle the task of automatically generating a function name from source code. Existing generators face difficulties in generating low-frequency or out-of-vocabulary subwords. In this paper, we propose two strategies for copying low-frequency or out-of-vocabulary subwords in inputs. Our best performing model showed an improvement over the conventional method in terms of our modified F1 and accuracy on the Java-small and Java-large datasets.

Related papers

Reverse-Engineering Decoding Strategies Given Blackbox Access to a Language Generation System [73.52878118434147]
We present methods to reverse-engineer the decoding method used to generate text. Our ability to discover which decoding strategy was used has implications for detecting generated text.
arXiv Detail & Related papers (2023-09-09T18:19:47Z)
EntropyRank: Unsupervised Keyphrase Extraction via Side-Information Optimization for Language Model-based Text Compression [62.261476176242724]
We propose an unsupervised method to extract keywords and keyphrases from texts based on a pre-trained language model (LM) and Shannon's information. Specifically, our method extracts phrases having the highest conditional entropy under the LM.
arXiv Detail & Related papers (2023-08-25T14:23:40Z)
Data Augmentation for Low-Resource Keyphrase Generation [46.52115499306222]
Keyphrase generation is the task of summarizing the contents of any given article into a few salient phrases (or keyphrases) Existing works for the task mostly rely on large-scale annotated datasets, which are not easy to acquire. We present data augmentation strategies specifically to address keyphrase generation in purely resource-constrained domains.
arXiv Detail & Related papers (2023-05-29T09:20:34Z)
Hierarchical Sketch Induction for Paraphrase Generation [79.87892048285819]
We introduce Hierarchical Refinement Quantized Variational Autoencoders (HRQ-VAE), a method for learning decompositions of dense encodings. We use HRQ-VAE to encode the syntactic form of an input sentence as a path through the hierarchy, allowing us to more easily predict syntactic sketches at test time.
arXiv Detail & Related papers (2022-03-07T15:28:36Z)
Position-Invariant Truecasing with a Word-and-Character Hierarchical Recurrent Neural Network [10.425277173548212]
We propose a fast, accurate and compact two-level hierarchical word-and-character-based recurrent neural network model. We also address the problem of truecasing while ignoring token positions in the sentence.
arXiv Detail & Related papers (2021-08-26T17:54:35Z)
A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep Learning for Source Code [14.904366372190943]
We propose a method, based on identifier anonymization, to handle out-of-vocabulary (OOV) identifiers. Our method can be treated as a preprocessing step and, therefore, allows for easy implementation. We show that the proposed OOV anonymization method significantly improves the performance of the Transformer in two code processing tasks.
arXiv Detail & Related papers (2020-10-23T20:52:46Z)
Select, Extract and Generate: Neural Keyphrase Generation with Layer-wise Coverage Attention [75.44523978180317]
We propose emphSEG-Net, a neural keyphrase generation model that is composed of two major components. The experimental results on seven keyphrase generation benchmarks from scientific and web documents demonstrate that SEG-Net outperforms the state-of-the-art neural generative methods by a large margin.
arXiv Detail & Related papers (2020-08-04T18:00:07Z)
Deep Transformer based Data Augmentation with Subword Units for Morphologically Rich Online ASR [0.0]
Deep Transformer models have proven to be particularly powerful in language modeling tasks for ASR. Recent studies showed that a considerable part of the knowledge of neural network Language Models (LM) can be transferred to traditional n-grams by using neural text generation based data augmentation. We show that although data augmentation with Transformer-generated text works well for isolating languages, it causes a vocabulary explosion in a morphologically rich language. We propose a new method called subword-based neural text augmentation, where we retokenize the generated text into statistically derived subwords.
arXiv Detail & Related papers (2020-07-14T10:22:05Z)
BSDAR: Beam Search Decoding with Attention Reward in Neural Keyphrase Generation [22.512774028870922]
We introduce a beam search decoding strategy based on word-level and ngram-level reward function to constrain and refine Seq2Seq inference at test time. Results show that our simple proposal can overcome the algorithm bias to shorter and nearly identical sequences, resulting in a significant improvement of the decoding performance.
arXiv Detail & Related papers (2019-09-17T18:44:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.