MELM: Data Augmentation with Masked Entity Language Modeling for
Cross-lingual NER
- URL: http://arxiv.org/abs/2108.13655v1
- Date: Tue, 31 Aug 2021 07:37:43 GMT
- Title: MELM: Data Augmentation with Masked Entity Language Modeling for
Cross-lingual NER
- Authors: Ran Zhou, Ruidan He, Xin Li, Lidong Bing, Erik Cambria, Luo Si,
Chunyan Miao
- Abstract summary: We propose a data augmentation framework with Masked-Entity Language Modeling (MELM)
MELM linearizes NER labels into sentence context, and thus the fine-tuned MELM is able to predict masked tokens by explicitly conditioning on their labels.
When unlabeled target data is available and MELM can be further applied to augment pseudo-labeled target data, the performance gain reaches 5.7%.
- Score: 73.91145686634133
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data augmentation for cross-lingual NER requires fine-grained control over
token labels of the augmented text. Existing augmentation approach based on
masked language modeling may replace a labeled entity with words of a different
class, which makes the augmented sentence incompatible with the original label
sequence, and thus hurts the performance.We propose a data augmentation
framework with Masked-Entity Language Modeling (MELM) which effectively ensures
the replacing entities fit the original labels. Specifically, MELM linearizes
NER labels into sentence context, and thus the fine-tuned MELM is able to
predict masked tokens by explicitly conditioning on their labels. Our MELM is
agnostic to the source of data to be augmented. Specifically, when MELM is
applied to augment training data of the source language, it achieves up to 3.5%
F1 score improvement for cross-lingual NER. When unlabeled target data is
available and MELM can be further applied to augment pseudo-labeled target
data, the performance gain reaches 5.7%. Moreover, MELM consistently
outperforms multiple baseline methods for data augmentation.
Related papers
- Entity Alignment with Noisy Annotations from Large Language Models [15.189701951003611]
We propose a unified framework, LLM4EA, to effectively leverage Large Language Models for EA.
Specifically, we design a novel active learning policy to significantly reduce the annotation space.
We iteratively optimize the policy based on the feedback from a base EA model.
arXiv Detail & Related papers (2024-05-27T03:52:55Z) - MLLM-DataEngine: An Iterative Refinement Approach for MLLM [62.30753425449056]
We propose a novel closed-loop system that bridges data generation, model training, and evaluation.
Within each loop, the MLLM-DataEngine first analyze the weakness of the model based on the evaluation results.
For targeting, we propose an Adaptive Bad-case Sampling module, which adjusts the ratio of different types of data.
For quality, we resort to GPT-4 to generate high-quality data with each given data type.
arXiv Detail & Related papers (2023-08-25T01:41:04Z) - CROP: Zero-shot Cross-lingual Named Entity Recognition with Multilingual
Labeled Sequence Translation [113.99145386490639]
Cross-lingual NER can transfer knowledge between languages via aligned cross-lingual representations or machine translation results.
We propose a Cross-lingual Entity Projection framework (CROP) to enable zero-shot cross-lingual NER.
We adopt a multilingual labeled sequence translation model to project the tagged sequence back to the target language and label the target raw sentence.
arXiv Detail & Related papers (2022-10-13T13:32:36Z) - Semantically Consistent Data Augmentation for Neural Machine Translation
via Conditional Masked Language Model [5.756426081817803]
This paper introduces a new data augmentation method for neural machine translation.
Our method is based on Conditional Masked Language Model (CMLM)
We show that CMLM is capable of enforcing semantic consistency by conditioning on both source and target during substitution.
arXiv Detail & Related papers (2022-09-22T09:19:08Z) - Always Keep your Target in Mind: Studying Semantics and Improving
Performance of Neural Lexical Substitution [124.99894592871385]
We present a large-scale comparative study of lexical substitution methods employing both old and most recent language models.
We show that already competitive results achieved by SOTA LMs/MLMs can be further substantially improved if information about the target word is injected properly.
arXiv Detail & Related papers (2022-06-07T16:16:19Z) - Focus on the Target's Vocabulary: Masked Label Smoothing for Machine
Translation [25.781293857729864]
Masked Label Smoothing (MLS) is a new mechanism that masks the soft label probability of source-side words to zero.
Our experiments show that MLS consistently yields improvement over original label smoothing on different datasets.
arXiv Detail & Related papers (2022-03-06T07:01:39Z) - Pre-training Co-evolutionary Protein Representation via A Pairwise
Masked Language Model [93.9943278892735]
Key problem in protein sequence representation learning is to capture the co-evolutionary information reflected by the inter-residue co-variation in the sequences.
We propose a novel method to capture this information directly by pre-training via a dedicated language model, i.e., Pairwise Masked Language Model (PMLM)
Our result shows that the proposed method can effectively capture the interresidue correlations and improves the performance of contact prediction by up to 9% compared to the baseline.
arXiv Detail & Related papers (2021-10-29T04:01:32Z) - Label Mask for Multi-Label Text Classification [6.742627397194543]
We propose a Label Mask multi-label text classification model (LM-MTC), which is inspired by the idea of cloze questions of language model.
On the basis, we assign a different token to each potential label, and randomly mask the token with a certain probability to build a label based Masked Language Model (MLM)
arXiv Detail & Related papers (2021-06-18T11:54:33Z) - FILTER: An Enhanced Fusion Method for Cross-lingual Language
Understanding [85.29270319872597]
We propose an enhanced fusion method that takes cross-lingual data as input for XLM finetuning.
During inference, the model makes predictions based on the text input in the target language and its translation in the source language.
To tackle this issue, we propose an additional KL-divergence self-teaching loss for model training, based on auto-generated soft pseudo-labels for translated text in the target language.
arXiv Detail & Related papers (2020-09-10T22:42:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.