Adversarial Transfer Learning for Punctuation Restoration
- URL: http://arxiv.org/abs/2004.00248v1
- Date: Wed, 1 Apr 2020 06:19:56 GMT
- Title: Adversarial Transfer Learning for Punctuation Restoration
- Authors: Jiangyan Yi, Jianhua Tao, Ye Bai, Zhengkun Tian, Cunhang Fan
- Abstract summary: Adversarial multi-task learning is introduced to learn task invariant knowledge for punctuation prediction.
Experiments are conducted on IWSLT2011 datasets.
- Score: 58.2201356693101
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Previous studies demonstrate that word embeddings and part-of-speech (POS)
tags are helpful for punctuation restoration tasks. However, two drawbacks
still exist. One is that word embeddings are pre-trained by unidirectional
language modeling objectives. Thus the word embeddings only contain
left-to-right context information. The other is that POS tags are provided by
an external POS tagger. So computation cost will be increased and incorrect
predicted tags may affect the performance of restoring punctuation marks during
decoding. This paper proposes adversarial transfer learning to address these
problems. A pre-trained bidirectional encoder representations from transformers
(BERT) model is used to initialize a punctuation model. Thus the transferred
model parameters carry both left-to-right and right-to-left representations.
Furthermore, adversarial multi-task learning is introduced to learn task
invariant knowledge for punctuation prediction. We use an extra POS tagging
task to help the training of the punctuation predicting task. Adversarial
training is utilized to prevent the shared parameters from containing task
specific information. We only use the punctuation predicting task to restore
marks during decoding stage. Therefore, it will not need extra computation and
not introduce incorrect tags from the POS tagger. Experiments are conducted on
IWSLT2011 datasets. The results demonstrate that the punctuation predicting
models obtain further performance improvement with task invariant knowledge
from the POS tagging task. Our best model outperforms the previous
state-of-the-art model trained only with lexical features by up to 9.2%
absolute overall F_1-score on test set.
Related papers
- Chinese Spelling Correction as Rephrasing Language Model [63.65217759957206]
We study Chinese Spelling Correction (CSC), which aims to detect and correct the potential spelling errors in a given sentence.
Current state-of-the-art methods regard CSC as a sequence tagging task and fine-tune BERT-based models on sentence pairs.
We propose Rephrasing Language Model (ReLM), where the model is trained to rephrase the entire sentence by infilling additional slots, instead of character-to-character tagging.
arXiv Detail & Related papers (2023-08-17T06:04:28Z) - Prompt Consistency for Zero-Shot Task Generalization [118.81196556175797]
In this paper, we explore methods to utilize unlabeled data to improve zero-shot performance.
Specifically, we take advantage of the fact that multiple prompts can be used to specify a single task, and propose to regularize prompt consistency.
Our approach outperforms the state-of-the-art zero-shot learner, T0, on 9 out of 11 datasets across 4 NLP tasks by up to 10.6 absolute points in terms of accuracy.
arXiv Detail & Related papers (2022-04-29T19:18:37Z) - Pre-trained Token-replaced Detection Model as Few-shot Learner [31.40447168356879]
We propose a novel approach to few-shot learning with pre-trained token-replaced detection models like ELECTRA.
A systematic evaluation on 16 datasets demonstrates that our approach outperforms few-shot learners with pre-trained masked language models.
arXiv Detail & Related papers (2022-03-07T09:47:53Z) - CaSP: Class-agnostic Semi-Supervised Pretraining for Detection and
Segmentation [60.28924281991539]
We propose a novel Class-agnostic Semi-supervised Pretraining (CaSP) framework to achieve a more favorable task-specificity balance.
Using 3.6M unlabeled data, we achieve a remarkable performance gain of 4.7% over ImageNet-pretrained baseline on object detection.
arXiv Detail & Related papers (2021-12-09T14:54:59Z) - Token-Level Supervised Contrastive Learning for Punctuation Restoration [7.9713449581347104]
Punctuation is critical in understanding natural language text.
Most automatic speech recognition systems do not generate punctuation.
Recent work in punctuation restoration heavily utilizes pre-trained language models.
arXiv Detail & Related papers (2021-07-19T18:24:33Z) - Incorporating External POS Tagger for Punctuation Restoration [11.573672075002007]
Punctuation restoration is an important post-processing step in automatic speech recognition.
Part-of-speech (POS) taggers provide informative tags, suggesting each input token's syntactic role.
We incorporate an external POS tagger and fuse its predicted labels into the existing language model to provide syntactic information.
arXiv Detail & Related papers (2021-06-12T09:58:06Z) - Reliable Part-of-Speech Tagging of Historical Corpora through Set-Valued Prediction [21.67895423776014]
We consider POS tagging within the framework of set-valued prediction.
We find that extending state-of-the-art POS taggers to set-valued prediction yields more precise and robust taggings.
arXiv Detail & Related papers (2020-08-04T07:21:36Z) - Predicting What You Already Know Helps: Provable Self-Supervised
Learning [60.27658820909876]
Self-supervised representation learning solves auxiliary prediction tasks (known as pretext tasks) without requiring labeled data.
We show a mechanism exploiting the statistical connections between certain em reconstruction-based pretext tasks that guarantee to learn a good representation.
We prove the linear layer yields small approximation error even for complex ground truth function class.
arXiv Detail & Related papers (2020-08-03T17:56:13Z) - On the Robustness of Language Encoders against Grammatical Errors [66.05648604987479]
We collect real grammatical errors from non-native speakers and conduct adversarial attacks to simulate these errors on clean text data.
Results confirm that the performance of all tested models is affected but the degree of impact varies.
arXiv Detail & Related papers (2020-05-12T11:01:44Z) - Machine Learning Approaches for Amharic Parts-of-speech Tagging [0.0]
Performance of the current POS taggers in Amharic is not as good as that of the contemporary POS taggers available for English and other European languages.
The aim of this work is to improve POS tagging performance for the Amharic language, which was never above 91%.
arXiv Detail & Related papers (2020-01-10T06:40:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.