ProphetNet: Predicting Future N-gram for Sequence-to-Sequence
Pre-training
- URL: http://arxiv.org/abs/2001.04063v3
- Date: Wed, 21 Oct 2020 05:45:35 GMT
- Title: ProphetNet: Predicting Future N-gram for Sequence-to-Sequence
Pre-training
- Authors: Weizhen Qi, Yu Yan, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen,
Ruofei Zhang, Ming Zhou
- Abstract summary: We present a new sequence-to-sequence pre-training model called ProphetNet.
It introduces a novel self-supervised objective named future n-gram prediction.
We conduct experiments on CNN/DailyMail, Gigaword, and SQuAD 1.1 benchmarks for abstractive summarization and question generation tasks.
- Score: 85.35910219651572
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a new sequence-to-sequence pre-training model called
ProphetNet, which introduces a novel self-supervised objective named future
n-gram prediction and the proposed n-stream self-attention mechanism. Instead
of optimizing one-step-ahead prediction in the traditional sequence-to-sequence
model, the ProphetNet is optimized by n-step ahead prediction that predicts the
next n tokens simultaneously based on previous context tokens at each time
step. The future n-gram prediction explicitly encourages the model to plan for
the future tokens and prevent overfitting on strong local correlations. We
pre-train ProphetNet using a base scale dataset (16GB) and a large-scale
dataset (160GB), respectively. Then we conduct experiments on CNN/DailyMail,
Gigaword, and SQuAD 1.1 benchmarks for abstractive summarization and question
generation tasks. Experimental results show that ProphetNet achieves new
state-of-the-art results on all these datasets compared to the models using the
same scale pre-training corpus.
Related papers
- Sparse Prototype Network for Explainable Pedestrian Behavior Prediction [60.80524827122901]
We present Sparse Prototype Network (SPN), an explainable method designed to simultaneously predict a pedestrian's future action, trajectory, and pose.
Regularized by mono-semanticity and clustering constraints, the prototypes learn consistent and human-understandable features.
arXiv Detail & Related papers (2024-10-16T03:33:40Z) - TokenUnify: Scalable Autoregressive Visual Pre-training with Mixture Token Prediction [61.295716741720284]
TokenUnify is a novel pretraining method that integrates random token prediction, next-token prediction, and next-all token prediction.
Cooperated with TokenUnify, we have assembled a large-scale electron microscopy (EM) image dataset with ultra-high resolution.
This dataset includes over 120 million annotated voxels, making it the largest neuron segmentation dataset to date.
arXiv Detail & Related papers (2024-05-27T05:45:51Z) - PerfSAGE: Generalized Inference Performance Predictor for Arbitrary Deep
Learning Models on Edge Devices [8.272409756443539]
This paper describes PerfSAGE, a novel graph neural network (GNN) that predicts inference latency, energy, and memory footprint on an arbitrary DNNlite graph.
Using this dataset, we train PerfSAGE and provide experimental results that demonstrate state-of-the-art prediction accuracy with a Mean Absolute Percentage Error of 5% across all targets and model search spaces.
arXiv Detail & Related papers (2023-01-26T08:59:15Z) - NETpred: Network-based modeling and prediction of multiple connected
market indices [8.122270502556372]
We introduce a framework called NETpred that generates a novel graph representing multiple related indices and their stocks.
It then thoroughly selects a diverse set of representative nodes that cover different parts of the state space and whose price movements are accurately predictable.
The resulting model is then used to predict the stock labels which are finally aggregated to infer the labels for all the index nodes in the graph.
arXiv Detail & Related papers (2022-12-02T17:23:09Z) - Boosted Dynamic Neural Networks [53.559833501288146]
A typical EDNN has multiple prediction heads at different layers of the network backbone.
To optimize the model, these prediction heads together with the network backbone are trained on every batch of training data.
Treating training and testing inputs differently at the two phases will cause the mismatch between training and testing data distributions.
We formulate an EDNN as an additive model inspired by gradient boosting, and propose multiple training techniques to optimize the model effectively.
arXiv Detail & Related papers (2022-11-30T04:23:12Z) - Pyramidal Predictive Network: A Model for Visual-frame Prediction Based
on Predictive Coding Theory [1.4610038284393165]
We propose a novel neural network model for the task of visual-frame prediction.
The model is composed of a series of recurrent and convolutional units forming the top-down and bottom-up streams.
It learns to predict future frames in a visual sequence, with ConvLSTMs on each layer in the network making local prediction from top to down.
arXiv Detail & Related papers (2022-08-15T06:28:34Z) - Overlooked Poses Actually Make Sense: Distilling Privileged Knowledge
for Human Motion Prediction [26.25110973770013]
Previous works on human motion prediction follow the pattern of building a mapping relation between the sequence observed and the one to be predicted.
We present a new prediction pattern, which introduces previously overlooked human poses, to implement the prediction task.
These poses exist after the predicted sequence, and form the privileged sequence.
arXiv Detail & Related papers (2022-08-02T08:13:43Z) - AutoCP: Automated Pipelines for Accurate Prediction Intervals [84.16181066107984]
This paper proposes an AutoML framework called Automatic Machine Learning for Conformal Prediction (AutoCP)
Unlike the familiar AutoML frameworks that attempt to select the best prediction model, AutoCP constructs prediction intervals that achieve the user-specified target coverage rate.
We tested AutoCP on a variety of datasets and found that it significantly outperforms benchmark algorithms.
arXiv Detail & Related papers (2020-06-24T23:13:11Z) - Predicting Temporal Sets with Deep Neural Networks [50.53727580527024]
We propose an integrated solution based on the deep neural networks for temporal sets prediction.
A unique perspective is to learn element relationship by constructing set-level co-occurrence graph.
We design an attention-based module to adaptively learn the temporal dependency of elements and sets.
arXiv Detail & Related papers (2020-06-20T03:29:02Z) - Modeling Musical Onset Probabilities via Neural Distribution Learning [11.094116617743962]
Musical onset detection can be formulated as a time-to-event (TTE) or time-since-event (TSE) prediction task.
We propose a novel method to model the probability of onsets by introducing a sequential density prediction model.
arXiv Detail & Related papers (2020-02-10T05:38:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.