Unsupervised Improvement of Factual Knowledge in Language Models
- URL: http://arxiv.org/abs/2304.01597v1
- Date: Tue, 4 Apr 2023 07:37:06 GMT
- Title: Unsupervised Improvement of Factual Knowledge in Language Models
- Authors: Nafis Sadeq, Byungkyu Kang, Prarit Lamba, Julian McAuley
- Abstract summary: Masked language modeling plays a key role in pretraining large language models.
We propose an approach for influencing pretraining in a way that can improve language model performance on a variety of knowledge-intensive tasks.
- Score: 4.5788796239850225
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Masked language modeling (MLM) plays a key role in pretraining large language
models. But the MLM objective is often dominated by high-frequency words that
are sub-optimal for learning factual knowledge. In this work, we propose an
approach for influencing MLM pretraining in a way that can improve language
model performance on a variety of knowledge-intensive tasks. We force the
language model to prioritize informative words in a fully unsupervised way.
Experiments demonstrate that the proposed approach can significantly improve
the performance of pretrained language models on tasks such as factual recall,
question answering, sentiment analysis, and natural language inference in a
closed-book setting.
Related papers
- MoE-CT: A Novel Approach For Large Language Models Training With Resistance To Catastrophic Forgetting [53.77590764277568]
We introduce a novel MoE-CT architecture that separates the base model's learning from the multilingual expansion process.
Our design freezes the original LLM parameters, thus safeguarding its performance in high-resource languages, while an appended MoE module, trained on diverse language datasets, augments low-resource language proficiency.
arXiv Detail & Related papers (2024-06-25T11:03:45Z) - Improving Factuality and Reasoning in Language Models through Multiagent
Debate [95.10641301155232]
We present a complementary approach to improve language responses where multiple language model instances propose and debate their individual responses and reasoning processes over multiple rounds to arrive at a common final answer.
Our findings indicate that this approach significantly enhances mathematical and strategic reasoning across a number of tasks.
Our approach may be directly applied to existing black-box models and uses identical procedure and prompts for all tasks we investigate.
arXiv Detail & Related papers (2023-05-23T17:55:11Z) - A Survey of Large Language Models [81.06947636926638]
Language modeling has been widely studied for language understanding and generation in the past two decades.
Recently, pre-trained language models (PLMs) have been proposed by pre-training Transformer models over large-scale corpora.
To discriminate the difference in parameter scale, the research community has coined the term large language models (LLM) for the PLMs of significant size.
arXiv Detail & Related papers (2023-03-31T17:28:46Z) - An Overview on Language Models: Recent Developments and Outlook [32.528770408502396]
Conventional language models (CLMs) aim to predict the probability of linguistic sequences in a causal manner.
Pre-trained language models (PLMs) cover broader concepts and can be used in both causal sequential modeling and fine-tuning for downstream applications.
arXiv Detail & Related papers (2023-03-10T07:55:00Z) - LERT: A Linguistically-motivated Pre-trained Language Model [67.65651497173998]
We propose LERT, a pre-trained language model that is trained on three types of linguistic features along with the original pre-training task.
We carried out extensive experiments on ten Chinese NLU tasks, and the experimental results show that LERT could bring significant improvements.
arXiv Detail & Related papers (2022-11-10T05:09:16Z) - Improving Policy Learning via Language Dynamics Distillation [87.27583619910338]
We propose Language Dynamics Distillation (LDD), which pretrains a model to predict environment dynamics given demonstrations with language descriptions.
We show that language descriptions in demonstrations improve sample-efficiency and generalization across environments.
arXiv Detail & Related papers (2022-09-30T19:56:04Z) - Pre-Trained Language Models for Interactive Decision-Making [72.77825666035203]
We describe a framework for imitation learning in which goals and observations are represented as a sequence of embeddings.
We demonstrate that this framework enables effective generalization across different environments.
For test tasks involving novel goals or novel scenes, initializing policies with language models improves task completion rates by 43.6%.
arXiv Detail & Related papers (2022-02-03T18:55:52Z) - Universal Sentence Representation Learning with Conditional Masked
Language Model [7.334766841801749]
We present Conditional Masked Language Modeling (M) to effectively learn sentence representations.
Our English CMLM model achieves state-of-the-art performance on SentEval.
As a fully unsupervised learning method, CMLM can be conveniently extended to a broad range of languages and domains.
arXiv Detail & Related papers (2020-12-28T18:06:37Z) - DICT-MLM: Improved Multilingual Pre-Training using Bilingual
Dictionaries [8.83363871195679]
Masked modeling (MLM) objective as key language learning objective.
DICT-MLM works by incentivizing the model to be able to predict not just the original masked word, but potentially any of its cross-lingual synonyms as well.
Our empirical analysis on multiple downstream tasks spanning 30+ languages, demonstrates the efficacy of the proposed approach.
arXiv Detail & Related papers (2020-10-23T17:53:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.