Pointing to Subwords for Generating Function Names in Source Code
- URL: http://arxiv.org/abs/2011.04241v1
- Date: Mon, 9 Nov 2020 08:17:17 GMT
- Title: Pointing to Subwords for Generating Function Names in Source Code
- Authors: Shogo Fujita, Hidetaka Kamigaito, Hiroya Takamura, Manabu Okumura
- Abstract summary: We propose two strategies for copying low-frequency or out-of-vocabulary subwords in inputs.
Our best performing model showed an improvement over the conventional method in terms of our modified F1.
- Score: 43.36314933559263
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We tackle the task of automatically generating a function name from source
code. Existing generators face difficulties in generating low-frequency or
out-of-vocabulary subwords. In this paper, we propose two strategies for
copying low-frequency or out-of-vocabulary subwords in inputs. Our best
performing model showed an improvement over the conventional method in terms of
our modified F1 and accuracy on the Java-small and Java-large datasets.
Related papers
- Reverse-Engineering Decoding Strategies Given Blackbox Access to a
Language Generation System [73.52878118434147]
We present methods to reverse-engineer the decoding method used to generate text.
Our ability to discover which decoding strategy was used has implications for detecting generated text.
arXiv Detail & Related papers (2023-09-09T18:19:47Z) - EntropyRank: Unsupervised Keyphrase Extraction via Side-Information
Optimization for Language Model-based Text Compression [62.261476176242724]
We propose an unsupervised method to extract keywords and keyphrases from texts based on a pre-trained language model (LM) and Shannon's information.
Specifically, our method extracts phrases having the highest conditional entropy under the LM.
arXiv Detail & Related papers (2023-08-25T14:23:40Z) - Data Augmentation for Low-Resource Keyphrase Generation [46.52115499306222]
Keyphrase generation is the task of summarizing the contents of any given article into a few salient phrases (or keyphrases)
Existing works for the task mostly rely on large-scale annotated datasets, which are not easy to acquire.
We present data augmentation strategies specifically to address keyphrase generation in purely resource-constrained domains.
arXiv Detail & Related papers (2023-05-29T09:20:34Z) - Hierarchical Sketch Induction for Paraphrase Generation [79.87892048285819]
We introduce Hierarchical Refinement Quantized Variational Autoencoders (HRQ-VAE), a method for learning decompositions of dense encodings.
We use HRQ-VAE to encode the syntactic form of an input sentence as a path through the hierarchy, allowing us to more easily predict syntactic sketches at test time.
arXiv Detail & Related papers (2022-03-07T15:28:36Z) - Position-Invariant Truecasing with a Word-and-Character Hierarchical
Recurrent Neural Network [10.425277173548212]
We propose a fast, accurate and compact two-level hierarchical word-and-character-based recurrent neural network model.
We also address the problem of truecasing while ignoring token positions in the sentence.
arXiv Detail & Related papers (2021-08-26T17:54:35Z) - A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep
Learning for Source Code [14.904366372190943]
We propose a method, based on identifier anonymization, to handle out-of-vocabulary (OOV) identifiers.
Our method can be treated as a preprocessing step and, therefore, allows for easy implementation.
We show that the proposed OOV anonymization method significantly improves the performance of the Transformer in two code processing tasks.
arXiv Detail & Related papers (2020-10-23T20:52:46Z) - Select, Extract and Generate: Neural Keyphrase Generation with
Layer-wise Coverage Attention [75.44523978180317]
We propose emphSEG-Net, a neural keyphrase generation model that is composed of two major components.
The experimental results on seven keyphrase generation benchmarks from scientific and web documents demonstrate that SEG-Net outperforms the state-of-the-art neural generative methods by a large margin.
arXiv Detail & Related papers (2020-08-04T18:00:07Z) - Deep Transformer based Data Augmentation with Subword Units for
Morphologically Rich Online ASR [0.0]
Deep Transformer models have proven to be particularly powerful in language modeling tasks for ASR.
Recent studies showed that a considerable part of the knowledge of neural network Language Models (LM) can be transferred to traditional n-grams by using neural text generation based data augmentation.
We show that although data augmentation with Transformer-generated text works well for isolating languages, it causes a vocabulary explosion in a morphologically rich language.
We propose a new method called subword-based neural text augmentation, where we retokenize the generated text into statistically derived subwords.
arXiv Detail & Related papers (2020-07-14T10:22:05Z) - BSDAR: Beam Search Decoding with Attention Reward in Neural Keyphrase
Generation [22.512774028870922]
We introduce a beam search decoding strategy based on word-level and ngram-level reward function to constrain and refine Seq2Seq inference at test time.
Results show that our simple proposal can overcome the algorithm bias to shorter and nearly identical sequences, resulting in a significant improvement of the decoding performance.
arXiv Detail & Related papers (2019-09-17T18:44:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.