Analysing the Effect of Masking Length Distribution of MLM: An
Evaluation Framework and Case Study on Chinese MRC Datasets
- URL: http://arxiv.org/abs/2110.15712v1
- Date: Wed, 29 Sep 2021 04:07:05 GMT
- Title: Analysing the Effect of Masking Length Distribution of MLM: An
Evaluation Framework and Case Study on Chinese MRC Datasets
- Authors: Changchang. Zeng and Shaobo. Li
- Abstract summary: Masked language model (MLM) is a self-trained training objective widely used in various PTMs.
In different machine reading comprehension tasks, the length of the answer is also different, and the answer is often a word, phrase, or sentence.
In this paper, we try to uncover how much of four's success in the machine reading comprehension tasks comes from the correlation between masking length distribution and answer length in MRC dataset.
- Score: 0.8566457170664925
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine reading comprehension (MRC) is a challenging natural language
processing (NLP) task. Recently, the emergence of pre-trained models (PTM) has
brought this research field into a new era, in which the training objective
plays a key role. The masked language model (MLM) is a self-supervised training
objective that widely used in various PTMs. With the development of training
objectives, many variants of MLM have been proposed, such as whole word
masking, entity masking, phrase masking, span masking, and so on. In different
MLM, the length of the masked tokens is different. Similarly, in different
machine reading comprehension tasks, the length of the answer is also
different, and the answer is often a word, phrase, or sentence. Thus, in MRC
tasks with different answer lengths, whether the length of MLM is related to
performance is a question worth studying. If this hypothesis is true, it can
guide us how to pre-train the MLM model with a relatively suitable mask length
distribution for MRC task. In this paper, we try to uncover how much of MLM's
success in the machine reading comprehension tasks comes from the correlation
between masking length distribution and answer length in MRC dataset. In order
to address this issue, herein, (1) we propose four MRC tasks with different
answer length distributions, namely short span extraction task, long span
extraction task, short multiple-choice cloze task, long multiple-choice cloze
task; (2) four Chinese MRC datasets are created for these tasks; (3) we also
have pre-trained four masked language models according to the answer length
distributions of these datasets; (4) ablation experiments are conducted on the
datasets to verify our hypothesis. The experimental results demonstrate that
our hypothesis is true.
Related papers
- Generalization v.s. Memorization: Tracing Language Models' Capabilities Back to Pretraining Data [76.90128359866462]
We introduce an extended concept of memorization, distributional memorization, which measures the correlation between the output probabilities and the pretraining data frequency.
This study demonstrates that memorization plays a larger role in simpler, knowledge-intensive tasks, while generalization is the key for harder, reasoning-based tasks.
arXiv Detail & Related papers (2024-07-20T21:24:40Z) - SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts.
We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM.
We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z) - Pre-training LLMs using human-like development data corpus [3.5757761767474876]
We pre-train and evaluate Large Language Models (LLMs) on their ability to learn contextual word representations using roughly the same number of tokens as seen by children.
We provide a strong set of baselines; with different architectures, evaluation of changes in performance across epochs, and reported pre-training metrics for the strict small and strict tracks of the task.
arXiv Detail & Related papers (2023-11-08T13:13:23Z) - M4LE: A Multi-Ability Multi-Range Multi-Task Multi-Domain Long-Context Evaluation Benchmark for Large Language Models [58.54538318912159]
M4LE is a benchmark for evaluating the long-sequence capability of large language models (LLMs)
M4LE is based on a diverse NLP task pool comprising 36 NLP task types and 12 domains.
We conducted a systematic evaluation on 11 well-established LLMs, especially those optimized for long-sequence inputs.
arXiv Detail & Related papers (2023-10-30T03:11:30Z) - LaGR-SEQ: Language-Guided Reinforcement Learning with Sample-Efficient
Querying [71.86163159193327]
Large language models (LLMs) have recently demonstrated their impressive ability to provide context-aware responses via text.
This ability could potentially be used to predict plausible solutions in sequential decision making tasks pertaining to pattern completion.
We introduce LaGR, which uses this predictive ability of LLMs to propose solutions to tasks that have been partially completed by a primary reinforcement learning (RL) agent.
arXiv Detail & Related papers (2023-08-21T02:07:35Z) - Enhancing In-Context Learning with Answer Feedback for Multi-Span
Question Answering [9.158919909909146]
In this paper, we propose a novel way of employing labeled data such as it informs LLM of some undesired output.
Experiments on three multi-span question answering datasets and a keyphrase extraction dataset show that our new prompting strategy consistently improves LLM's in-context learning performance.
arXiv Detail & Related papers (2023-06-07T15:20:24Z) - Multi-Task Instruction Tuning of LLaMa for Specific Scenarios: A
Preliminary Study on Writing Assistance [60.40541387785977]
Small foundational models can display remarkable proficiency in tackling diverse tasks when fine-tuned using instruction-driven data.
In this work, we investigate a practical problem setting where the primary focus is on one or a few particular tasks rather than general-purpose instruction following.
Experimental results show that fine-tuning LLaMA on writing instruction data significantly improves its ability on writing tasks.
arXiv Detail & Related papers (2023-05-22T16:56:44Z) - Bridging the Gap between Language Model and Reading Comprehension:
Unsupervised MRC via Self-Supervision [34.01738910736325]
We propose a new framework for unsupervised machine reading comprehension (MRC)
We learn to spot answer spans in documents via self-supervised learning, by designing a self-supervision pretext task for MRC - Spotting-MLM.
Experiments show that our method achieves a new state-of-the-art performance for unsupervised MRC.
arXiv Detail & Related papers (2021-07-19T02:14:36Z) - Masked Language Modeling and the Distributional Hypothesis: Order Word
Matters Pre-training for Little [74.49773960145681]
A possible explanation for the impressive performance of masked language model (MLM)-training is that such models have learned to represent the syntactic structures prevalent in NLP pipelines.
In this paper, we propose a different explanation: pre-trains succeed on downstream tasks almost entirely due to their ability to model higher-order word co-occurrence statistics.
Our results show that purely distributional information largely explains the success of pre-training, and underscore the importance of curating challenging evaluation datasets that require deeper linguistic knowledge.
arXiv Detail & Related papers (2021-04-14T06:30:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.