Distant Supervision and Noisy Label Learning for Low Resource Named
Entity Recognition: A Study on Hausa and Yor\`ub\'a
- URL: http://arxiv.org/abs/2003.08370v2
- Date: Tue, 31 Mar 2020 13:18:17 GMT
- Title: Distant Supervision and Noisy Label Learning for Low Resource Named
Entity Recognition: A Study on Hausa and Yor\`ub\'a
- Authors: David Ifeoluwa Adelani, Michael A. Hedderich, Dawei Zhu, Esther van
den Berg, Dietrich Klakow
- Abstract summary: Techniques such as distant and weak supervision can be used to create labeled data in a (semi-) automatic way.
We evaluate different embedding approaches and show that distant supervision can be successfully leveraged in a realistic low-resource scenario.
- Score: 23.68953940000046
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The lack of labeled training data has limited the development of natural
language processing tools, such as named entity recognition, for many languages
spoken in developing countries. Techniques such as distant and weak supervision
can be used to create labeled data in a (semi-) automatic way. Additionally, to
alleviate some of the negative effects of the errors in automatic annotation,
noise-handling methods can be integrated. Pretrained word embeddings are
another key component of most neural named entity classifiers. With the advent
of more complex contextual word embeddings, an interesting trade-off between
model size and performance arises. While these techniques have been shown to
work well in high-resource settings, we want to study how they perform in
low-resource scenarios. In this work, we perform named entity recognition for
Hausa and Yor\`ub\'a, two languages that are widely spoken in several
developing countries. We evaluate different embedding approaches and show that
distant supervision can be successfully leveraged in a realistic low-resource
scenario where it can more than double a classifier's performance.
Related papers
- Low-Resource Named Entity Recognition with Cross-Lingual, Character-Level Neural Conditional Random Fields [68.17213992395041]
Low-resource named entity recognition is still an open problem in NLP.
We present a transfer learning scheme, whereby we train character-level neural CRFs to predict named entities for both high-resource languages and low resource languages jointly.
arXiv Detail & Related papers (2024-04-14T23:44:49Z) - Language Models for Text Classification: Is In-Context Learning Enough? [54.869097980761595]
Recent foundational language models have shown state-of-the-art performance in many NLP tasks in zero- and few-shot settings.
An advantage of these models over more standard approaches is the ability to understand instructions written in natural language (prompts)
This makes them suitable for addressing text classification problems for domains with limited amounts of annotated instances.
arXiv Detail & Related papers (2024-03-26T12:47:39Z) - Self-Supervised Speech Representation Learning: A Review [105.1545308184483]
Self-supervised representation learning methods promise a single universal model that would benefit a wide variety of tasks and domains.
Speech representation learning is experiencing similar progress in three main categories: generative, contrastive, and predictive methods.
This review presents approaches for self-supervised speech representation learning and their connection to other research areas.
arXiv Detail & Related papers (2022-05-21T16:52:57Z) - Learning from Language Description: Low-shot Named Entity Recognition
via Decomposed Framework [23.501276952950366]
We propose a novel NER framework, namely SpanNER, which learns from natural language supervision and enables the identification of never-seen entity classes.
We perform extensive experiments on 5 benchmark datasets and evaluate the proposed method in the few-shot learning, domain transfer and zero-shot learning settings.
The experimental results show that the proposed method can bring 10%, 23% and 26% improvements in average over the best baselines in few-shot learning, domain transfer and zero-shot learning settings respectively.
arXiv Detail & Related papers (2021-09-11T19:52:09Z) - Reinforced Iterative Knowledge Distillation for Cross-Lingual Named
Entity Recognition [54.92161571089808]
Cross-lingual NER transfers knowledge from rich-resource language to languages with low resources.
Existing cross-lingual NER methods do not make good use of rich unlabeled data in target languages.
We develop a novel approach based on the ideas of semi-supervised learning and reinforcement learning.
arXiv Detail & Related papers (2021-06-01T05:46:22Z) - How Low is Too Low? A Computational Perspective on Extremely
Low-Resource Languages [1.7625363344837164]
We introduce the first cross-lingual information extraction pipeline for Sumerian.
We also curate InterpretLR, an interpretability toolkit for low-resource NLP.
Most components of our pipeline can be generalised to any other language to obtain an interpretable execution.
arXiv Detail & Related papers (2021-05-30T12:09:59Z) - Leveraging Adversarial Training in Self-Learning for Cross-Lingual Text
Classification [52.69730591919885]
We present a semi-supervised adversarial training process that minimizes the maximal loss for label-preserving input perturbations.
We observe significant gains in effectiveness on document and intent classification for a diverse set of languages.
arXiv Detail & Related papers (2020-07-29T19:38:35Z) - Multilingual Jointly Trained Acoustic and Written Word Embeddings [22.63696520064212]
We extend this idea to multiple low-resource languages.
We jointly train an AWE model and an AGWE model, using phonetically transcribed data from multiple languages.
The pre-trained models can then be used for unseen zero-resource languages, or fine-tuned on data from low-resource languages.
arXiv Detail & Related papers (2020-06-24T19:16:02Z) - Building Low-Resource NER Models Using Non-Speaker Annotation [58.78968578460793]
Cross-lingual methods have had notable success in addressing these concerns.
We propose a complementary approach to building low-resource Named Entity Recognition (NER) models using non-speaker'' (NS) annotations.
We show that use of NS annotators produces results that are consistently on par or better than cross-lingual methods built on modern contextual representations.
arXiv Detail & Related papers (2020-06-17T03:24:38Z) - Multilingual acoustic word embedding models for processing zero-resource
languages [37.78342106714364]
We train a single supervised embedding model on labelled data from multiple well-resourced languages.
We then apply it to unseen zero-resource languages.
arXiv Detail & Related papers (2020-02-06T05:53:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.