AdvPicker: Effectively Leveraging Unlabeled Data via Adversarial
Discriminator for Cross-Lingual NER
- URL: http://arxiv.org/abs/2106.02300v2
- Date: Mon, 7 Jun 2021 05:49:34 GMT
- Title: AdvPicker: Effectively Leveraging Unlabeled Data via Adversarial
Discriminator for Cross-Lingual NER
- Authors: Weile Chen, Huiqiang Jiang, Qianhui Wu, B\"orje F. Karlsson and Yi
Guan
- Abstract summary: We design an adversarial learning framework in which an encoder learns entity domain knowledge from labeled source-language data.
We show that the proposed method benefits strongly from this data selection process and outperforms existing state-of-the-art methods.
- Score: 2.739898536581301
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural methods have been shown to achieve high performance in Named Entity
Recognition (NER), but rely on costly high-quality labeled data for training,
which is not always available across languages. While previous works have shown
that unlabeled data in a target language can be used to improve cross-lingual
model performance, we propose a novel adversarial approach (AdvPicker) to
better leverage such data and further improve results. We design an adversarial
learning framework in which an encoder learns entity domain knowledge from
labeled source-language data and better shared features are captured via
adversarial training - where a discriminator selects less language-dependent
target-language data via similarity to the source language. Experimental
results on standard benchmark datasets well demonstrate that the proposed
method benefits strongly from this data selection process and outperforms
existing state-of-the-art methods; without requiring any additional external
resources (e.g., gazetteers or via machine translation). The code is available
at https://aka.ms/AdvPicker
Related papers
- OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion [88.59397418187226]
We propose a novel unified open-vocabulary detection method called OV-DINO.
It is pre-trained on diverse large-scale datasets with language-aware selective fusion in a unified framework.
We evaluate the performance of the proposed OV-DINO on popular open-vocabulary detection benchmarks.
arXiv Detail & Related papers (2024-07-10T17:05:49Z) - Constrained Decoding for Cross-lingual Label Projection [27.567195418950966]
Cross-lingual transfer using multilingual LLMs has become a popular learning paradigm for low-resource languages with no labeled training data.
However, for NLP tasks that involve fine-grained predictions on words and phrases, the performance of zero-shot cross-lingual transfer learning lags far behind supervised fine-tuning methods.
arXiv Detail & Related papers (2024-02-05T15:57:32Z) - ConNER: Consistency Training for Cross-lingual Named Entity Recognition [96.84391089120847]
Cross-lingual named entity recognition suffers from data scarcity in the target languages.
We propose ConNER as a novel consistency training framework for cross-lingual NER.
arXiv Detail & Related papers (2022-11-17T07:57:54Z) - CROP: Zero-shot Cross-lingual Named Entity Recognition with Multilingual
Labeled Sequence Translation [113.99145386490639]
Cross-lingual NER can transfer knowledge between languages via aligned cross-lingual representations or machine translation results.
We propose a Cross-lingual Entity Projection framework (CROP) to enable zero-shot cross-lingual NER.
We adopt a multilingual labeled sequence translation model to project the tagged sequence back to the target language and label the target raw sentence.
arXiv Detail & Related papers (2022-10-13T13:32:36Z) - A Dual-Contrastive Framework for Low-Resource Cross-Lingual Named Entity
Recognition [5.030581940990434]
Cross-lingual Named Entity Recognition (NER) has recently become a research hotspot because it can alleviate the data-hungry problem for low-resource languages.
In this paper, we describe our novel dual-contrastive framework ConCNER for cross-lingual NER under the scenario of limited source-language labeled data.
arXiv Detail & Related papers (2022-04-02T07:59:13Z) - Genre as Weak Supervision for Cross-lingual Dependency Parsing [18.755176247223616]
genre labels are frequently available, yet remain largely unexplored in cross-lingual setups.
We project treebank-level genre information to the finer-grained sentence level.
For 12 low-resource language treebanks, six of which are test-only, our genre-specific methods significantly outperform competitive baselines.
arXiv Detail & Related papers (2021-09-10T08:24:54Z) - On the Language Coverage Bias for Neural Machine Translation [81.81456880770762]
Language coverage bias is important for neural machine translation (NMT) because the target-original training data is not well exploited in current practice.
By carefully designing experiments, we provide comprehensive analyses of the language coverage bias in the training data.
We propose two simple and effective approaches to alleviate the language coverage bias problem.
arXiv Detail & Related papers (2021-06-07T01:55:34Z) - Reinforced Iterative Knowledge Distillation for Cross-Lingual Named
Entity Recognition [54.92161571089808]
Cross-lingual NER transfers knowledge from rich-resource language to languages with low resources.
Existing cross-lingual NER methods do not make good use of rich unlabeled data in target languages.
We develop a novel approach based on the ideas of semi-supervised learning and reinforcement learning.
arXiv Detail & Related papers (2021-06-01T05:46:22Z) - Adversarial Knowledge Transfer from Unlabeled Data [62.97253639100014]
We present a novel Adversarial Knowledge Transfer framework for transferring knowledge from internet-scale unlabeled data to improve the performance of a classifier.
An important novel aspect of our method is that the unlabeled source data can be of different classes from those of the labeled target data, and there is no need to define a separate pretext task.
arXiv Detail & Related papers (2020-08-13T08:04:27Z) - Single-/Multi-Source Cross-Lingual NER via Teacher-Student Learning on
Unlabeled Data in Target Language [28.8970132244542]
Cross-lingual NER must leverage knowledge learned from source languages with rich labeled data.
We propose a teacher-student learning method to address such limitations.
Our method outperforms existing state-of-the-art methods for both single-source and multi-source cross-lingual NER.
arXiv Detail & Related papers (2020-04-26T17:22:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.