Related papers: Transforming Sequence Tagging Into A Seq2Seq Task

Transforming Sequence Tagging Into A Seq2Seq Task

URL: http://arxiv.org/abs/2203.08378v1
Date: Wed, 16 Mar 2022 03:48:14 GMT
Title: Transforming Sequence Tagging Into A Seq2Seq Task
Authors: Karthik Raman and Iftekhar Naim and Jiecao Chen and Kazuma Hashimoto and Kiran Yalasangi and Krishna Srinivasan
Abstract summary: We study different formats one could use for casting input text sentences into the input and target of a Seq2Seq model. We introduce a new format, which we show to not only be simpler but also more effective. We find that the new format is more robust and almost completely devoid of hallucination.
Score: 10.130389627403433
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Pretrained, large, generative language models (LMs) have had great success in a wide range of sequence tagging and structured prediction tasks. Casting a sequence tagging task as a Seq2Seq one requires deciding the formats of the input and output sequences. However, we lack a principled understanding of the trade-offs associated with these formats (such as the effect on model accuracy, sequence length, multilingual generalization, hallucination). In this paper, we rigorously study different formats one could use for casting input text sentences and their output labels into the input and target (i.e., output) of a Seq2Seq model. Along the way, we introduce a new format, which we show to not only be simpler but also more effective. Additionally the new format demonstrates significant gains in the multilingual settings -- both zero-shot transfer learning and joint training. Lastly, we find that the new format is more robust and almost completely devoid of hallucination -- an issue we find common in existing formats. With well over a 1000 experiments studying 14 different formats, over 7 diverse public benchmarks -- including 3 multilingual datasets spanning 7 languages -- we believe our findings provide a strong empirical basis in understanding how we should tackle sequence tagging tasks.

Related papers

Lost in Space: Optimizing Tokens for Grammar-Constrained Decoding [3.5757761767474876]
We ask whether there are systematic differences between grammars that appear semantically similar to humans. We test four popular model families with five token formats on four NLP benchmarks. All models perform most accurately when instructed to classify with real numbers.
arXiv Detail & Related papers (2025-02-20T19:06:18Z)
Instruction Position Matters in Sequence Generation with Large Language Models [67.87516654892343]
Large language models (LLMs) are capable of performing conditional sequence generation tasks, such as translation or summarization. We propose enhancing the instruction-following capability of LLMs by shifting the position of task instructions after the input sentences.
arXiv Detail & Related papers (2023-08-23T12:36:57Z)
SeqGPT: An Out-of-the-box Large Language Model for Open Domain Sequence Understanding [103.34092301324425]
Large language models (LLMs) have shown impressive ability for open-domain NLP tasks. We present SeqGPT, a bilingual (i.e., English and Chinese) open-source autoregressive model specially enhanced for open-domain natural language understanding.
arXiv Detail & Related papers (2023-08-21T07:31:19Z)
On Measuring Social Biases in Prompt-Based Multi-Task Learning [1.3270286124913757]
We study T0, a large-scale multi-task text-to-text language model trained using prompt-based learning. We consider two different forms of semantically equivalent inputs: question-answer format and premise-hypothesis format.
arXiv Detail & Related papers (2022-05-23T20:01:20Z)
Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? [112.72413411257662]
Large language models (LMs) are able to in-context learn by conditioning on a few input-label pairs (demonstrations) and making predictions for new inputs. We show that ground truth demonstrations are in fact not required -- randomly replacing labels in the demonstrations barely hurts performance. We find that other aspects of the demonstrations are the key drivers of end task performance.
arXiv Detail & Related papers (2022-02-25T17:25:19Z)
Revisiting Self-Training for Few-Shot Learning of Language Model [61.173976954360334]
Unlabeled data carry rich task-relevant information, they are proven useful for few-shot learning of language model. In this work, we revisit the self-training technique for language model fine-tuning and present a state-of-the-art prompt-based few-shot learner, SFLM.
arXiv Detail & Related papers (2021-10-04T08:51:36Z)
Translate & Fill: Improving Zero-Shot Multilingual Semantic Parsing with Synthetic Data [2.225882303328135]
We propose a novel Translate-and-Fill (TaF) method to produce silver training data for a multilingual semantic parsing task. Experimental results on three multilingual semantic parsing datasets show that data augmentation with TaF reaches accuracies competitive with similar systems.
arXiv Detail & Related papers (2021-09-09T14:51:11Z)
Mixed Attention Transformer for LeveragingWord-Level Knowledge to Neural Cross-Lingual Information Retrieval [15.902630454568811]
We propose a novel Mixed Attention Transformer (MAT) that incorporates external word level knowledge, such as a dictionary or translation table. By encoding the translation knowledge into an attention matrix, the model with MAT is able to focus on the mutually translated words in the input sequence.
arXiv Detail & Related papers (2021-09-07T00:33:14Z)
Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word Alignment [49.45399359826453]
Cross-lingual language models are typically pretrained with language modeling on multilingual text or parallel sentences. We introduce denoising word alignment as a new cross-lingual pre-training task. Experimental results show that our method improves cross-lingual transferability on various datasets.
arXiv Detail & Related papers (2021-06-11T13:36:01Z)
Lattice-BERT: Leveraging Multi-Granularity Representations in Chinese Pre-trained Language Models [62.41139712595334]
We propose a novel pre-training paradigm for Chinese -- Lattice-BERT. We construct a lattice graph from the characters and words in a sentence and feed all these text units into transformers. We show that our model can bring an average increase of 1.5% under the 12-layer setting.
arXiv Detail & Related papers (2021-04-15T02:36:49Z)
Copy that! Editing Sequences by Copying Spans [40.23377412674599]
We present an extension of seq2seq models capable of copying entire spans of the input to the output in one step. In experiments on a range of editing tasks of natural language and source code, we show that our new model consistently outperforms simpler baselines.
arXiv Detail & Related papers (2020-06-08T17:42:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.