Improving Tree-Structured Decoder Training for Code Generation via
Mutual Learning
- URL: http://arxiv.org/abs/2105.14796v1
- Date: Mon, 31 May 2021 08:44:13 GMT
- Title: Improving Tree-Structured Decoder Training for Code Generation via
Mutual Learning
- Authors: Binbin Xie, Jinsong Su, Yubin Ge, Xiang Li, Jianwei Cui, Junfeng Yao
and Bin Wang
- Abstract summary: Code generation aims to automatically generate a piece of code given an input natural language utterance.
We first throughly analyze the context modeling difference between neural code generation models with different decodings.
We propose to introduce a mutual learning framework to jointly train these models.
- Score: 27.080718377956693
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Code generation aims to automatically generate a piece of code given an input
natural language utterance. Currently, among dominant models, it is treated as
a sequence-to-tree task, where a decoder outputs a sequence of actions
corresponding to the pre-order traversal of an Abstract Syntax Tree. However,
such a decoder only exploits the preorder traversal based preceding actions,
which are insufficient to ensure correct action predictions. In this paper, we
first throughly analyze the context modeling difference between neural code
generation models with different traversals based decodings (preorder traversal
vs breadth-first traversal), and then propose to introduce a mutual learning
framework to jointly train these models. Under this framework, we continuously
enhance both two models via mutual distillation, which involves synchronous
executions of two one-to-one knowledge transfers at each training step. More
specifically, we alternately choose one model as the student and the other as
its teacher, and require the student to fit the training data and the action
prediction distributions of its teacher. By doing so, both models can fully
absorb the knowledge from each other and thus could be improved simultaneously.
Experimental results and in-depth analysis on several benchmark datasets
demonstrate the effectiveness of our approach. We release our code at
https://github.com/DeepLearnXMU/CGML.
Related papers
- Faster Language Models with Better Multi-Token Prediction Using Tensor Decomposition [5.575078692353885]
We propose a new model for multi-token prediction in transformers, aiming to enhance sampling efficiency without compromising accuracy.
By generalizing it to a rank-$r$ canonical probability decomposition, we develop an improved model that predicts multiple tokens simultaneously.
arXiv Detail & Related papers (2024-10-23T11:06:36Z) - Bridging the Training-Inference Gap in LLMs by Leveraging Self-Generated Tokens [31.568675300434816]
Language models are often trained to maximize the likelihood of the next token given past tokens in the training dataset.
During inference time, they are utilized differently, generating text sequentially and auto-regressively by using previously generated tokens as input to predict the next one.
This paper proposes two simple approaches based on model own generation to address this discrepancy between the training and inference time.
arXiv Detail & Related papers (2024-10-18T17:48:27Z) - Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines [74.42485647685272]
We focus on Generative Masked Language Models (GMLMs)
We train a model to fit conditional probabilities of the data distribution via masking, which are subsequently used as inputs to a Markov Chain to draw samples from the model.
We adapt the T5 model for iteratively-refined parallel decoding, achieving 2-3x speedup in machine translation with minimal sacrifice in quality.
arXiv Detail & Related papers (2024-07-22T18:00:00Z) - Code Representation Learning At Scale [75.04686476303436]
We fuel code representation learning with a vast amount of code data via a two-stage pretraining scheme.
We first train the encoders via a mix that leverages both randomness in masking language modeling and the structure aspect of programming language.
We then enhance the representations via contrastive learning with hard negative and hard positive constructed in an unsupervised manner.
arXiv Detail & Related papers (2024-02-02T22:19:15Z) - DORE: Document Ordered Relation Extraction based on Generative Framework [56.537386636819626]
This paper investigates the root cause of the underwhelming performance of the existing generative DocRE models.
We propose to generate a symbolic and ordered sequence from the relation matrix which is deterministic and easier for model to learn.
Experimental results on four datasets show that our proposed method can improve the performance of the generative DocRE models.
arXiv Detail & Related papers (2022-10-28T11:18:10Z) - Efficient Sub-structured Knowledge Distillation [52.5931565465661]
We propose an approach that is much simpler in its formulation and far more efficient for training than existing approaches.
We transfer the knowledge from a teacher model to its student model by locally matching their predictions on all sub-structures, instead of the whole output space.
arXiv Detail & Related papers (2022-03-09T15:56:49Z) - UniXcoder: Unified Cross-Modal Pre-training for Code Representation [65.6846553962117]
We present UniXcoder, a unified cross-modal pre-trained model for programming language.
We propose a one-to-one mapping method to transform AST in a sequence structure that retains all structural information from the tree.
We evaluate UniXcoder on five code-related tasks over nine datasets.
arXiv Detail & Related papers (2022-03-08T04:48:07Z) - Bridging Pre-trained Models and Downstream Tasks for Source Code
Understanding [13.65914588243695]
We propose an approach to bridge pre-trained models and code-related tasks.
We exploit semantic-preserving transformation to enrich downstream data diversity.
We introduce curriculum learning to organize the transformed data in an easy-to-hard manner to fine-tune existing pre-trained models.
arXiv Detail & Related papers (2021-12-04T07:21:28Z) - Parameter Decoupling Strategy for Semi-supervised 3D Left Atrium
Segmentation [0.0]
We present a novel semi-supervised segmentation model based on parameter decoupling strategy to encourage consistent predictions from diverse views.
Our method has achieved a competitive result over the state-of-the-art semisupervised methods on the Atrial Challenge dataset.
arXiv Detail & Related papers (2021-09-20T14:51:42Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.