Transforming Sequence Tagging Into A Seq2Seq Task
- URL: http://arxiv.org/abs/2203.08378v1
- Date: Wed, 16 Mar 2022 03:48:14 GMT
- Title: Transforming Sequence Tagging Into A Seq2Seq Task
- Authors: Karthik Raman and Iftekhar Naim and Jiecao Chen and Kazuma Hashimoto
and Kiran Yalasangi and Krishna Srinivasan
- Abstract summary: We study different formats one could use for casting input text sentences into the input and target of a Seq2Seq model.
We introduce a new format, which we show to not only be simpler but also more effective.
We find that the new format is more robust and almost completely devoid of hallucination.
- Score: 10.130389627403433
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pretrained, large, generative language models (LMs) have had great success in
a wide range of sequence tagging and structured prediction tasks. Casting a
sequence tagging task as a Seq2Seq one requires deciding the formats of the
input and output sequences. However, we lack a principled understanding of the
trade-offs associated with these formats (such as the effect on model accuracy,
sequence length, multilingual generalization, hallucination). In this paper, we
rigorously study different formats one could use for casting input text
sentences and their output labels into the input and target (i.e., output) of a
Seq2Seq model. Along the way, we introduce a new format, which we show to not
only be simpler but also more effective. Additionally the new format
demonstrates significant gains in the multilingual settings -- both zero-shot
transfer learning and joint training. Lastly, we find that the new format is
more robust and almost completely devoid of hallucination -- an issue we find
common in existing formats. With well over a 1000 experiments studying 14
different formats, over 7 diverse public benchmarks -- including 3 multilingual
datasets spanning 7 languages -- we believe our findings provide a strong
empirical basis in understanding how we should tackle sequence tagging tasks.
Related papers
- Instruction Position Matters in Sequence Generation with Large Language
Models [67.87516654892343]
Large language models (LLMs) are capable of performing conditional sequence generation tasks, such as translation or summarization.
We propose enhancing the instruction-following capability of LLMs by shifting the position of task instructions after the input sentences.
arXiv Detail & Related papers (2023-08-23T12:36:57Z) - SeqGPT: An Out-of-the-box Large Language Model for Open Domain Sequence
Understanding [103.34092301324425]
Large language models (LLMs) have shown impressive ability for open-domain NLP tasks.
We present SeqGPT, a bilingual (i.e., English and Chinese) open-source autoregressive model specially enhanced for open-domain natural language understanding.
arXiv Detail & Related papers (2023-08-21T07:31:19Z) - On Measuring Social Biases in Prompt-Based Multi-Task Learning [1.3270286124913757]
We study T0, a large-scale multi-task text-to-text language model trained using prompt-based learning.
We consider two different forms of semantically equivalent inputs: question-answer format and premise-hypothesis format.
arXiv Detail & Related papers (2022-05-23T20:01:20Z) - Rethinking the Role of Demonstrations: What Makes In-Context Learning
Work? [112.72413411257662]
Large language models (LMs) are able to in-context learn by conditioning on a few input-label pairs (demonstrations) and making predictions for new inputs.
We show that ground truth demonstrations are in fact not required -- randomly replacing labels in the demonstrations barely hurts performance.
We find that other aspects of the demonstrations are the key drivers of end task performance.
arXiv Detail & Related papers (2022-02-25T17:25:19Z) - Revisiting Self-Training for Few-Shot Learning of Language Model [61.173976954360334]
Unlabeled data carry rich task-relevant information, they are proven useful for few-shot learning of language model.
In this work, we revisit the self-training technique for language model fine-tuning and present a state-of-the-art prompt-based few-shot learner, SFLM.
arXiv Detail & Related papers (2021-10-04T08:51:36Z) - Translate & Fill: Improving Zero-Shot Multilingual Semantic Parsing with
Synthetic Data [2.225882303328135]
We propose a novel Translate-and-Fill (TaF) method to produce silver training data for a multilingual semantic parsing task.
Experimental results on three multilingual semantic parsing datasets show that data augmentation with TaF reaches accuracies competitive with similar systems.
arXiv Detail & Related papers (2021-09-09T14:51:11Z) - Mixed Attention Transformer for LeveragingWord-Level Knowledge to Neural
Cross-Lingual Information Retrieval [15.902630454568811]
We propose a novel Mixed Attention Transformer (MAT) that incorporates external word level knowledge, such as a dictionary or translation table.
By encoding the translation knowledge into an attention matrix, the model with MAT is able to focus on the mutually translated words in the input sequence.
arXiv Detail & Related papers (2021-09-07T00:33:14Z) - Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word
Alignment [49.45399359826453]
Cross-lingual language models are typically pretrained with language modeling on multilingual text or parallel sentences.
We introduce denoising word alignment as a new cross-lingual pre-training task.
Experimental results show that our method improves cross-lingual transferability on various datasets.
arXiv Detail & Related papers (2021-06-11T13:36:01Z) - Lattice-BERT: Leveraging Multi-Granularity Representations in Chinese
Pre-trained Language Models [62.41139712595334]
We propose a novel pre-training paradigm for Chinese -- Lattice-BERT.
We construct a lattice graph from the characters and words in a sentence and feed all these text units into transformers.
We show that our model can bring an average increase of 1.5% under the 12-layer setting.
arXiv Detail & Related papers (2021-04-15T02:36:49Z) - Copy that! Editing Sequences by Copying Spans [40.23377412674599]
We present an extension of seq2seq models capable of copying entire spans of the input to the output in one step.
In experiments on a range of editing tasks of natural language and source code, we show that our new model consistently outperforms simpler baselines.
arXiv Detail & Related papers (2020-06-08T17:42:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.