CoLaDa: A Collaborative Label Denoising Framework for Cross-lingual
Named Entity Recognition
- URL: http://arxiv.org/abs/2305.14913v1
- Date: Wed, 24 May 2023 09:03:01 GMT
- Title: CoLaDa: A Collaborative Label Denoising Framework for Cross-lingual
Named Entity Recognition
- Authors: Tingting Ma, Qianhui Wu, Huiqiang Jiang, B\"orje F. Karlsson, Tiejun
Zhao, Chin-Yew Lin
- Abstract summary: Cross-lingual named entity recognition (NER) aims to train an NER system that generalizes well to a target language by leveraging labeled data in a given source language.
Previous work alleviates the data scarcity problem by translating source-language labeled data or performing knowledge distillation on target-language unlabeled data.
We propose CoLaDa, a Collaborative Label Denoising Framework, to address this problem.
- Score: 30.307982013964576
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cross-lingual named entity recognition (NER) aims to train an NER system that
generalizes well to a target language by leveraging labeled data in a given
source language. Previous work alleviates the data scarcity problem by
translating source-language labeled data or performing knowledge distillation
on target-language unlabeled data. However, these methods may suffer from label
noise due to the automatic labeling process. In this paper, we propose CoLaDa,
a Collaborative Label Denoising Framework, to address this problem.
Specifically, we first explore a model-collaboration-based denoising scheme
that enables models trained on different data sources to collaboratively
denoise pseudo labels used by each other. We then present an
instance-collaboration-based strategy that considers the label consistency of
each token's neighborhood in the representation space for denoising.
Experiments on different benchmark datasets show that the proposed CoLaDa
achieves superior results compared to previous methods, especially when
generalizing to distant languages.
Related papers
- Improving Pseudo Labels with Global-Local Denoising Framework for Cross-lingual Named Entity Recognition [15.31736490777998]
Cross-lingual named entity recognition (NER) aims to train an NER model for the target language.
We propose a Global-Local Denoising framework (GLoDe) for cross-lingual NER.
Experimental results on two benchmark datasets with six target languages demonstrate that our proposed GLoDe significantly outperforms current state-of-the-art methods.
arXiv Detail & Related papers (2024-06-03T11:29:19Z) - Maximizing Data Efficiency for Cross-Lingual TTS Adaptation by
Self-Supervised Representation Mixing and Embedding Initialization [57.38123229553157]
This paper presents an effective transfer learning framework for language adaptation in text-to-speech systems.
We focus on achieving language adaptation using minimal labeled and unlabeled data.
Experimental results show that our framework is able to synthesize intelligible speech in unseen languages with only 4 utterances of labeled data and 15 minutes of unlabeled data.
arXiv Detail & Related papers (2024-01-23T21:55:34Z) - mCL-NER: Cross-Lingual Named Entity Recognition via Multi-view
Contrastive Learning [54.523172171533645]
Cross-lingual named entity recognition (CrossNER) faces challenges stemming from uneven performance due to the scarcity of multilingual corpora.
We propose Multi-view Contrastive Learning for Cross-lingual Named Entity Recognition (mCL-NER)
Our experiments on the XTREME benchmark, spanning 40 languages, demonstrate the superiority of mCL-NER over prior data-driven and model-based approaches.
arXiv Detail & Related papers (2023-08-17T16:02:29Z) - Improving Self-training for Cross-lingual Named Entity Recognition with
Contrastive and Prototype Learning [80.08139343603956]
In cross-lingual named entity recognition, self-training is commonly used to bridge the linguistic gap.
In this work, we aim to improve self-training for cross-lingual NER by combining representation learning and pseudo label refinement.
Our proposed method, namely ContProto mainly comprises two components: (1) contrastive self-training and (2) prototype-based pseudo-labeling.
arXiv Detail & Related papers (2023-05-23T02:52:16Z) - ConNER: Consistency Training for Cross-lingual Named Entity Recognition [96.84391089120847]
Cross-lingual named entity recognition suffers from data scarcity in the target languages.
We propose ConNER as a novel consistency training framework for cross-lingual NER.
arXiv Detail & Related papers (2022-11-17T07:57:54Z) - CROP: Zero-shot Cross-lingual Named Entity Recognition with Multilingual
Labeled Sequence Translation [113.99145386490639]
Cross-lingual NER can transfer knowledge between languages via aligned cross-lingual representations or machine translation results.
We propose a Cross-lingual Entity Projection framework (CROP) to enable zero-shot cross-lingual NER.
We adopt a multilingual labeled sequence translation model to project the tagged sequence back to the target language and label the target raw sentence.
arXiv Detail & Related papers (2022-10-13T13:32:36Z) - A Dual-Contrastive Framework for Low-Resource Cross-Lingual Named Entity
Recognition [5.030581940990434]
Cross-lingual Named Entity Recognition (NER) has recently become a research hotspot because it can alleviate the data-hungry problem for low-resource languages.
In this paper, we describe our novel dual-contrastive framework ConCNER for cross-lingual NER under the scenario of limited source-language labeled data.
arXiv Detail & Related papers (2022-04-02T07:59:13Z) - Learning with Noisy Labels by Targeted Relabeling [52.0329205268734]
Crowdsourcing platforms are often used to collect datasets for training deep neural networks.
We propose an approach which reserves a fraction of annotations to explicitly relabel highly probable labeling errors.
arXiv Detail & Related papers (2021-10-15T20:37:29Z) - AdvPicker: Effectively Leveraging Unlabeled Data via Adversarial
Discriminator for Cross-Lingual NER [2.739898536581301]
We design an adversarial learning framework in which an encoder learns entity domain knowledge from labeled source-language data.
We show that the proposed method benefits strongly from this data selection process and outperforms existing state-of-the-art methods.
arXiv Detail & Related papers (2021-06-04T07:17:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.