Data Annealing for Informal Language Understanding Tasks
- URL: http://arxiv.org/abs/2004.13833v1
- Date: Fri, 24 Apr 2020 09:27:09 GMT
- Title: Data Annealing for Informal Language Understanding Tasks
- Authors: Jing Gu, Zhou Yu
- Abstract summary: We propose a data annealing transfer learning procedure to bridge the performance gap on informal language tasks.
It successfully utilizes a pre-trained model such as BERT in informal language.
- Score: 66.2988222278475
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There is a huge performance gap between formal and informal language
understanding tasks. The recent pre-trained models that improved the
performance of formal language understanding tasks did not achieve a comparable
result on informal language. We pro-pose a data annealing transfer learning
procedure to bridge the performance gap on informal natural language
understanding tasks. It successfully utilizes a pre-trained model such as BERT
in informal language. In our data annealing procedure, the training set
contains mainly formal text data at first; then, the proportion of the informal
text data is gradually increased during the training process. Our data
annealing procedure is model-independent and can be applied to various tasks.
We validate its effectiveness in exhaustive experiments. When BERT is
implemented with our learning procedure, it outperforms all the
state-of-the-art models on the three common informal language tasks.
Related papers
- Machine Translation to Control Formality Features in the Target Language [0.9208007322096532]
This research explores how machine learning methods are used to translate from English to languages with formality.
It was done by training a bilingual model in a formality-controlled setting and comparing its performance with a pre-trained multilingual model.
We evaluate the official formality accuracy(ACC) by comparing the predicted masked tokens with the ground truth.
arXiv Detail & Related papers (2023-11-22T15:42:51Z) - Subspace Chronicles: How Linguistic Information Emerges, Shifts and
Interacts during Language Model Training [56.74440457571821]
We analyze tasks covering syntax, semantics and reasoning, across 2M pre-training steps and five seeds.
We identify critical learning phases across tasks and time, during which subspaces emerge, share information, and later disentangle to specialize.
Our findings have implications for model interpretability, multi-task learning, and learning from limited data.
arXiv Detail & Related papers (2023-10-25T09:09:55Z) - Pre-Trained Language-Meaning Models for Multilingual Parsing and
Generation [14.309869321407522]
We introduce multilingual pre-trained language-meaning models based on Discourse Representation Structures (DRSs)
Since DRSs are language neutral, cross-lingual transfer learning is adopted to further improve the performance of non-English tasks.
automatic evaluation results show that our approach achieves the best performance on both the multilingual DRS parsing and DRS-to-text generation tasks.
arXiv Detail & Related papers (2023-05-31T19:00:33Z) - Annotated Dataset Creation through General Purpose Language Models for
non-English Medical NLP [0.5482532589225552]
In our work we suggest to leverage pretrained language models for training data acquisition.
We create a custom dataset which we use to train a medical NER model for German texts, GPTNERMED.
arXiv Detail & Related papers (2022-08-30T18:42:55Z) - Curriculum-Based Self-Training Makes Better Few-Shot Learners for
Data-to-Text Generation [56.98033565736974]
We propose Curriculum-Based Self-Training (CBST) to leverage unlabeled data in a rearranged order determined by the difficulty of text generation.
Our method can outperform fine-tuning and task-adaptive pre-training methods, and achieve state-of-the-art performance in the few-shot setting of data-to-text generation.
arXiv Detail & Related papers (2022-06-06T16:11:58Z) - Actuarial Applications of Natural Language Processing Using
Transformers: Case Studies for Using Text Features in an Actuarial Context [0.0]
This tutorial demonstrates to incorporate text data into actuarial classification and regression tasks.
The main focus is on methods employing transformer-based models.
The case studies tackle challenges related to a multi-lingual setting and long input sequences.
arXiv Detail & Related papers (2022-06-04T15:39:30Z) - LICHEE: Improving Language Model Pre-training with Multi-grained
Tokenization [19.89228774074371]
We propose a simple yet effective pre-training method named LICHEE to efficiently incorporate multi-grained information of input text.
Our method can be applied to various pre-trained language models and improve their representation capability.
arXiv Detail & Related papers (2021-08-02T12:08:19Z) - ERICA: Improving Entity and Relation Understanding for Pre-trained
Language Models via Contrastive Learning [97.10875695679499]
We propose a novel contrastive learning framework named ERICA in pre-training phase to obtain a deeper understanding of the entities and their relations in text.
Experimental results demonstrate that our proposed ERICA framework achieves consistent improvements on several document-level language understanding tasks.
arXiv Detail & Related papers (2020-12-30T03:35:22Z) - Pre-Training a Language Model Without Human Language [74.11825654535895]
We study how the intrinsic nature of pre-training data contributes to the fine-tuned downstream performance.
We find that models pre-trained on unstructured data beat those trained directly from scratch on downstream tasks.
To our great astonishment, we uncover that pre-training on certain non-human language data gives GLUE performance close to performance pre-trained on another non-English language.
arXiv Detail & Related papers (2020-12-22T13:38:06Z) - Knowledge-Aware Procedural Text Understanding with Multi-Stage Training [110.93934567725826]
We focus on the task of procedural text understanding, which aims to comprehend such documents and track entities' states and locations during a process.
Two challenges, the difficulty of commonsense reasoning and data insufficiency, still remain unsolved.
We propose a novel KnOwledge-Aware proceduraL text understAnding (KOALA) model, which effectively leverages multiple forms of external knowledge.
arXiv Detail & Related papers (2020-09-28T10:28:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.