ANNA: Enhanced Language Representation for Question Answering
- URL: http://arxiv.org/abs/2203.14507v1
- Date: Mon, 28 Mar 2022 05:26:52 GMT
- Title: ANNA: Enhanced Language Representation for Question Answering
- Authors: Changwook Jun, Hansol Jang, Myoseop Sim, Hyun Kim, Jooyoung Choi,
Kyungkoo Min, Kyunghoon Bae
- Abstract summary: We show how approaches affect performance individually and that the approaches are jointly considered in pre-training models.
We propose an extended pre-training task, and a new neighbor-aware mechanism that attends neighboring tokens more to capture the richness of context for pre-training language modeling.
Our best model achieves new state-of-the-art results of 95.7% F1 and 90.6% EM on SQuAD 1.1 and also outperforms existing pre-trained language models such as RoBERTa, ALBERT, ELECTRA, and XLNet.
- Score: 5.713808202873983
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Pre-trained language models have brought significant improvements in
performance in a variety of natural language processing tasks. Most existing
models performing state-of-the-art results have shown their approaches in the
separate perspectives of data processing, pre-training tasks, neural network
modeling, or fine-tuning. In this paper, we demonstrate how the approaches
affect performance individually, and that the language model performs the best
results on a specific question answering task when those approaches are jointly
considered in pre-training models. In particular, we propose an extended
pre-training task, and a new neighbor-aware mechanism that attends neighboring
tokens more to capture the richness of context for pre-training language
modeling. Our best model achieves new state-of-the-art results of 95.7\% F1 and
90.6\% EM on SQuAD 1.1 and also outperforms existing pre-trained language
models such as RoBERTa, ALBERT, ELECTRA, and XLNet on the SQuAD 2.0 benchmark.
Related papers
- Pretrained Generative Language Models as General Learning Frameworks for
Sequence-Based Tasks [0.0]
We propose that small pretrained foundational generative language models can be utilized as a general learning framework for sequence-based tasks.
Our proposal overcomes the computational resource, skill set, and timeline challenges associated with training neural networks and language models from scratch.
We demonstrate that 125M, 350M, and 1.3B parameter pretrained foundational language models can be instruction fine-tuned with 10,000-to-1,000,000 instruction examples.
arXiv Detail & Related papers (2024-02-08T12:19:32Z) - Improving Pre-trained Language Model Fine-tuning with Noise Stability
Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR)
Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model.
We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z) - Few-shot Subgoal Planning with Language Models [58.11102061150875]
We show that language priors encoded in pre-trained language models allow us to infer fine-grained subgoal sequences.
In contrast to recent methods which make strong assumptions about subgoal supervision, our experiments show that language models can infer detailed subgoal sequences without any fine-tuning.
arXiv Detail & Related papers (2022-05-28T01:03:30Z) - A Generative Language Model for Few-shot Aspect-Based Sentiment Analysis [90.24921443175514]
We focus on aspect-based sentiment analysis, which involves extracting aspect term, category, and predicting their corresponding polarities.
We propose to reformulate the extraction and prediction tasks into the sequence generation task, using a generative language model with unidirectional attention.
Our approach outperforms the previous state-of-the-art (based on BERT) on average performance by a large margins in few-shot and full-shot settings.
arXiv Detail & Related papers (2022-04-11T18:31:53Z) - A Comparative Study on Language Models for Task-Oriented Dialogue
Systems [14.634286037008017]
In task-oriented dialogue (ToD) systems, language models can be used for end-to-end training.
BART and T5 outperform GPT-based models in BLEU and F1 scores and achieve state-of-the-art performance in a ToD system.
arXiv Detail & Related papers (2022-01-21T13:24:25Z) - Language Models are Few-Shot Butlers [0.2538209532048867]
We introduce a two-stage procedure to learn from a small set of demonstrations and further improve by interacting with an environment.
We show that language models fine-tuned with only 1.2% of the expert demonstrations and a simple reinforcement learning algorithm achieve a 51% absolute improvement in success rate over existing methods in the ALFWorld environment.
arXiv Detail & Related papers (2021-04-16T08:47:07Z) - Learning Better Sentence Representation with Syntax Information [0.0]
We propose a novel approach to combining syntax information with a pre-trained language model.
Our model achieves 91.2% accuracy, outperforming the baseline model by 37.8% on sentence completion task.
arXiv Detail & Related papers (2021-01-09T12:15:08Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Grounded Compositional Outputs for Adaptive Language Modeling [59.02706635250856]
A language model's vocabulary$-$typically selected before training and permanently fixed later$-$affects its size.
We propose a fully compositional output embedding layer for language models.
To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.
arXiv Detail & Related papers (2020-09-24T07:21:14Z) - Exploring Fine-tuning Techniques for Pre-trained Cross-lingual Models
via Continual Learning [74.25168207651376]
Fine-tuning pre-trained language models to downstream cross-lingual tasks has shown promising results.
We leverage continual learning to preserve the cross-lingual ability of the pre-trained model when we fine-tune it to downstream tasks.
Our methods achieve better performance than other fine-tuning baselines on the zero-shot cross-lingual part-of-speech tagging and named entity recognition tasks.
arXiv Detail & Related papers (2020-04-29T14:07:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.