BioCopy: A Plug-And-Play Span Copy Mechanism in Seq2Seq Models
- URL: http://arxiv.org/abs/2109.12533v1
- Date: Sun, 26 Sep 2021 08:55:26 GMT
- Title: BioCopy: A Plug-And-Play Span Copy Mechanism in Seq2Seq Models
- Authors: Yi Liu, Guoan Zhang, Puning Yu, Jianlin Su, Shengfeng Pan
- Abstract summary: We propose a plug-and-play architecture, namely BioCopy, to alleviate the problem of losing essential tokens while copying long spans.
Specifically, in the training stage, we construct a BIO tag for each token and train the original model with BIO tags jointly.
In the inference stage, the model will firstly predict the BIO tag at each time step, then conduct different mask strategies based on the predicted BIO label.
Experimental results on two separate generative tasks show that they all outperform the baseline models by adding our BioCopy to the original model structure.
- Score: 3.823919891699282
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Copy mechanisms explicitly obtain unchanged tokens from the source (input)
sequence to generate the target (output) sequence under the neural seq2seq
framework. However, most of the existing copy mechanisms only consider single
word copying from the source sentences, which results in losing essential
tokens while copying long spans. In this work, we propose a plug-and-play
architecture, namely BioCopy, to alleviate the problem aforementioned.
Specifically, in the training stage, we construct a BIO tag for each token and
train the original model with BIO tags jointly. In the inference stage, the
model will firstly predict the BIO tag at each time step, then conduct
different mask strategies based on the predicted BIO label to diminish the
scope of the probability distributions over the vocabulary list. Experimental
results on two separate generative tasks show that they all outperform the
baseline models by adding our BioCopy to the original model structure.
Related papers
- Mitigating Copy Bias in In-Context Learning through Neuron Pruning [74.91243772654519]
Large language models (LLMs) have demonstrated impressive few-shot in-context learning abilities.
They are sometimes prone to a copying bias', where they copy answers from provided examples instead of learning the underlying patterns.
We propose a novel and simple method to mitigate such copying bias.
arXiv Detail & Related papers (2024-10-02T07:18:16Z) - CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generation [132.00910067533982]
We introduce CopyBench, a benchmark designed to measure both literal and non-literal copying in LM generations.
We find that, although literal copying is relatively rare, two types of non-literal copying -- event copying and character copying -- occur even in models as small as 7B parameters.
arXiv Detail & Related papers (2024-07-09T17:58:18Z) - From Self-Attention to Markov Models: Unveiling the Dynamics of
Generative Transformers [41.82477691012942]
We study learning a 1-layer self-attention model from a set of prompts and associated output data.
We first establish a precise mapping between the self-attention mechanism and Markov models.
We characterize an intriguing winner-takes-all phenomenon where the generative process implemented by self-attention collapses into sampling a limited subset of tokens.
arXiv Detail & Related papers (2024-02-21T03:51:34Z) - Object Recognition as Next Token Prediction [99.40793702627396]
We present an approach to pose object recognition as next token prediction.
The idea is to apply a language decoder that auto-regressively predicts the text tokens from image embeddings to form labels.
arXiv Detail & Related papers (2023-12-04T18:58:40Z) - May the Force Be with Your Copy Mechanism: Enhanced Supervised-Copy
Method for Natural Language Generation [1.2453219864236247]
We propose a novel supervised approach of a copy network that helps the model decide which words need to be copied and which need to be generated.
Specifically, we re-define the objective function, which leverages source sequences and target vocabularies as guidance for copying.
The experimental results on data-to-text generation and abstractive summarization tasks verify that our approach enhances the copying quality and improves the degree of abstractness.
arXiv Detail & Related papers (2021-12-20T06:54:28Z) - On the Copying Behaviors of Pre-Training for Neural Machine Translation [63.914940899327966]
Previous studies have shown that initializing neural machine translation (NMT) models with the pre-trained language models (LM) can speed up the model training and boost the model performance.
In this work, we identify a critical side-effect of pre-training for NMT, which is due to the discrepancy between the training objectives of LM-based pre-training and NMT.
We propose a simple and effective method named copying penalty to control the copying behaviors in decoding.
arXiv Detail & Related papers (2021-07-17T10:02:30Z) - Low-Resource Task-Oriented Semantic Parsing via Intrinsic Modeling [65.51280121472146]
We exploit what we intrinsically know about ontology labels to build efficient semantic parsing models.
Our model is highly efficient using a low-resource benchmark derived from TOPv2.
arXiv Detail & Related papers (2021-04-15T04:01:02Z) - Fast and Effective Biomedical Entity Linking Using a Dual Encoder [48.86736921025866]
We propose a BERT-based dual encoder model that resolves multiple mentions in a document in one shot.
We show that our proposed model is multiple times faster than existing BERT-based models while being competitive in accuracy for biomedical entity linking.
arXiv Detail & Related papers (2021-03-08T19:32:28Z) - CopyNext: Explicit Span Copying and Alignment in Sequence to Sequence
Models [31.832217465573503]
We present a model with an explicit token-level copy operation and extend it to copying entire spans.
Our model provides hard alignments between spans in the input and output, allowing for nontraditional applications of seq2seq, like information extraction.
arXiv Detail & Related papers (2020-10-28T22:45:16Z) - Copy that! Editing Sequences by Copying Spans [40.23377412674599]
We present an extension of seq2seq models capable of copying entire spans of the input to the output in one step.
In experiments on a range of editing tasks of natural language and source code, we show that our new model consistently outperforms simpler baselines.
arXiv Detail & Related papers (2020-06-08T17:42:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.