CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language
Understanding and Generation
- URL: http://arxiv.org/abs/2109.05729v2
- Date: Tue, 14 Sep 2021 08:35:14 GMT
- Title: CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language
Understanding and Generation
- Authors: Yunfan Shao, Zhichao Geng, Yitao Liu, Junqi Dai, Fei Yang, Li Zhe,
Hujun Bao, Xipeng Qiu
- Abstract summary: Chinese Pre-trained Unbalanced Transformer (CPT) is designed for both natural language understanding (NLU) and natural language generation (NLG) tasks.
CPT consists of three parts: a shared encoder, an understanding decoder, and a generation decoder.
With partially shared architecture and multi-task pre-training, CPT can learn specific knowledge of both NLU or NLG tasks with two decoders.
- Score: 38.02741711554989
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we take the advantage of previous pre-trained models (PTMs)
and propose a novel Chinese Pre-trained Unbalanced Transformer (CPT). Different
from previous Chinese PTMs, CPT is designed for both natural language
understanding (NLU) and natural language generation (NLG) tasks. CPT consists
of three parts: a shared encoder, an understanding decoder, and a generation
decoder. Two specific decoders with a shared encoder are pre-trained with
masked language modeling (MLM) and denoising auto-encoding (DAE) tasks,
respectively. With the partially shared architecture and multi-task
pre-training, CPT can (1) learn specific knowledge of both NLU or NLG tasks
with two decoders and (2) be fine-tuned flexibly that fully exploits the
potential of the model. Moreover, the unbalanced Transformer saves the
computational and storage cost, which makes CPT competitive and greatly
accelerates the inference of text generation. Experimental results on a wide
range of Chinese NLU and NLG tasks show the effectiveness of CPT.
Related papers
- How to get better embeddings with code pre-trained models? An empirical
study [6.220333404184779]
We study five different code pre-trained models (PTMs) to generate embeddings for downstream classification tasks.
We find that embeddings obtained through special tokens do not sufficiently aggregate the semantic information of the entire code snippet.
The quality of code embeddings obtained by combing code data and text data in the same way as pre-training the PTMs is poor and cannot guarantee richer semantic information.
arXiv Detail & Related papers (2023-11-14T10:44:21Z) - Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired
Speech Data [145.95460945321253]
We introduce two pre-training tasks for the encoder-decoder network using acoustic units, i.e., pseudo codes.
The proposed Speech2C can relatively reduce the word error rate (WER) by 19.2% over the method without decoder pre-training.
arXiv Detail & Related papers (2022-03-31T15:33:56Z) - CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for
Code Understanding and Generation [36.47905744758698]
We present CodeT5, a unified pre-trained encoder-decoder Transformer model that better leverages the code semantics conveyed from the developer-assigned identifiers.
Our model employs a unified framework to seamlessly support both code understanding and generation tasks and allows for multi-task learning.
arXiv Detail & Related papers (2021-09-02T12:21:06Z) - DeltaLM: Encoder-Decoder Pre-training for Language Generation and
Translation by Augmenting Pretrained Multilingual Encoders [92.90543340071007]
We introduce DeltaLM, a pretrained multilingual encoder-decoder model.
Specifically, we augment the pretrained multilingual encoder with a decoder and pre-train it in a self-supervised way.
Experiments show that DeltaLM outperforms various strong baselines on both natural language generation and translation tasks.
arXiv Detail & Related papers (2021-06-25T16:12:10Z) - Scheduled Sampling in Vision-Language Pretraining with Decoupled
Encoder-Decoder Network [99.03895740754402]
We propose a two-stream decoupled design of encoder-decoder structure, in which two decoupled cross-modal encoder and decoder are involved.
As an alternative, we propose a primary scheduled sampling strategy that mitigates such discrepancy via pretraining encoder-decoder in a two-pass manner.
arXiv Detail & Related papers (2021-01-27T17:36:57Z) - UniVL: A Unified Video and Language Pre-Training Model for Multimodal
Understanding and Generation [76.12027504427708]
This paper proposes UniVL: a Unified Video and Language pre-training model for both multimodal understanding and generation.
It comprises four components, including two single-modal encoders, a cross encoder, and a decoder with the Transformer backbone.
We develop two pre-training strategies, stage by stage pre-training (StagedP) and enhanced video representation (EnhancedV) to make the training process of the UniVL more effective.
arXiv Detail & Related papers (2020-02-15T10:03:25Z) - Bi-Decoder Augmented Network for Neural Machine Translation [108.3931242633331]
We propose a novel Bi-Decoder Augmented Network (BiDAN) for the neural machine translation task.
Since each decoder transforms the representations of the input text into its corresponding language, jointly training with two target ends can make the shared encoder has the potential to produce a language-independent semantic space.
arXiv Detail & Related papers (2020-01-14T02:05:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.