Low-resource Languages: A Review of Past Work and Future Challenges
- URL: http://arxiv.org/abs/2006.07264v1
- Date: Fri, 12 Jun 2020 15:21:57 GMT
- Title: Low-resource Languages: A Review of Past Work and Future Challenges
- Authors: Alexandre Magueresse, Vincent Carles, Evan Heetderks
- Abstract summary: A current problem in NLP is massaging and processing low-resource languages which lack useful training attributes such as supervised data, number of native speakers or experts, etc.
This review paper concisely summarizes previous groundbreaking achievements made towards resolving this problem, and analyzes potential improvements in the context of the overall future research direction.
- Score: 68.8204255655161
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A current problem in NLP is massaging and processing low-resource languages
which lack useful training attributes such as supervised data, number of native
speakers or experts, etc. This review paper concisely summarizes previous
groundbreaking achievements made towards resolving this problem, and analyzes
potential improvements in the context of the overall future research direction.
Related papers
- Transcending Language Boundaries: Harnessing LLMs for Low-Resource Language Translation [38.81102126876936]
This paper introduces a novel retrieval-based method that enhances translation quality for low-resource languages by focusing on key terms.
To evaluate the effectiveness of this method, we conducted experiments translating from English into three low-resource languages: Cherokee, a critically endangered indigenous language of North America; Tibetan, a historically and culturally significant language in Asia; and Manchu, a language with few remaining speakers.
Our comparison with the zero-shot performance of GPT-4o and LLaMA 3.1 405B, highlights the significant challenges these models face when translating into low-resource languages.
arXiv Detail & Related papers (2024-11-18T05:41:27Z) - Challenges and Opportunities of NLP for HR Applications: A Discussion Paper [13.584222421057696]
Machine learning and natural language processing have opened up vast areas of potential application use cases.
We review the use cases for text analytics in the realm of human resources/personnel management.
arXiv Detail & Related papers (2024-05-13T14:09:06Z) - Multilingual Large Language Model: A Survey of Resources, Taxonomy and Frontiers [81.47046536073682]
We present a review and provide a unified perspective to summarize the recent progress as well as emerging trends in multilingual large language models (MLLMs) literature.
We hope our work can provide the community with quick access and spur breakthrough research in MLLMs.
arXiv Detail & Related papers (2024-04-07T11:52:44Z) - Order Matters in the Presence of Dataset Imbalance for Multilingual
Learning [53.74649778447903]
We present a simple yet effective method of pre-training on high-resource tasks, followed by fine-tuning on a mixture of high/low-resource tasks.
We show its improvements in neural machine translation (NMT) and multi-lingual language modeling.
arXiv Detail & Related papers (2023-12-11T05:46:57Z) - A Comprehensive Survey on Relation Extraction: Recent Advances and New Frontiers [76.51245425667845]
Relation extraction (RE) involves identifying the relations between entities from underlying content.
Deep neural networks have dominated the field of RE and made noticeable progress.
This survey is expected to facilitate researchers' collaborative efforts to address the challenges of real-world RE systems.
arXiv Detail & Related papers (2023-06-03T08:39:25Z) - Out-of-Distribution Generalization in Text Classification: Past,
Present, and Future [30.581612475530974]
Machine learning (ML) systems in natural language processing (NLP) face significant challenges in generalizing to out-of-distribution (OOD) data.
This poses important questions about the robustness of NLP models and their high accuracy, which may be artificially inflated due to their underlying sensitivity to systematic biases.
This paper presents the first comprehensive review of recent progress, methods, and evaluations on this topic.
arXiv Detail & Related papers (2023-05-23T14:26:11Z) - A Survey on Knowledge-Enhanced Pre-trained Language Models [8.54551743144995]
Natural Language Processing (NLP) has been revolutionized by the use of Pre-trained Language Models (PLMs)
Despite setting new records in nearly every NLP task, PLMs still face a number of challenges including poor interpretability, weak reasoning capability, and the need for a lot of expensive annotated data when applied to downstream tasks.
By integrating external knowledge into PLMs, textitunderlineKnowledge-underlineEnhanced underlinePre-trained underlineLanguage underlineModels
arXiv Detail & Related papers (2022-12-27T09:54:14Z) - Efficient Methods for Natural Language Processing: A Survey [76.34572727185896]
This survey synthesizes and relates current methods and findings in efficient NLP.
We aim to provide both guidance for conducting NLP under limited resources, and point towards promising research directions for developing more efficient methods.
arXiv Detail & Related papers (2022-08-31T20:32:35Z) - Survey of Low-Resource Machine Translation [65.52755521004794]
There are currently around 7000 languages spoken in the world and almost all language pairs lack significant resources for training machine translation models.
There has been increasing interest in research addressing the challenge of producing useful translation models when very little translated training data is available.
arXiv Detail & Related papers (2021-09-01T16:57:58Z) - QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering
and Reading Comprehension [41.6087902739702]
This study is the largest survey of the field to date.
We provide an overview of the various formats and domains of the current resources, highlighting the current lacunae for future work.
We also discuss the implications of over-focusing on English, and survey the current monolingual resources for other languages and multilingual resources.
arXiv Detail & Related papers (2021-07-27T10:09:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.