Towards Systematic Monolingual NLP Surveys: GenA of Greek NLP
- URL: http://arxiv.org/abs/2407.09861v1
- Date: Sat, 13 Jul 2024 12:01:52 GMT
- Title: Towards Systematic Monolingual NLP Surveys: GenA of Greek NLP
- Authors: Juli Bakagianni, Kanella Pouli, Maria Gavriilidou, John Pavlopoulos,
- Abstract summary: This study fills the gap by introducing a method for creating systematic and comprehensive monolingual NLP surveys.
Characterized by a structured search protocol, it can be used to select publications and organize them through a taxonomy of NLP tasks.
By applying our method, we conducted a systematic literature review of Greek NLP from 2012 to 2022.
- Score: 2.3499129784547663
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Natural Language Processing (NLP) research has traditionally been predominantly focused on English, driven by the availability of resources, the size of the research community, and market demands. Recently, there has been a noticeable shift towards multilingualism in NLP, recognizing the need for inclusivity and effectiveness across diverse languages and cultures. Monolingual surveys have the potential to complement the broader trend towards multilingualism in NLP by providing foundational insights and resources necessary for effectively addressing the linguistic diversity of global communication. However, monolingual NLP surveys are extremely rare in literature. This study fills the gap by introducing a method for creating systematic and comprehensive monolingual NLP surveys. Characterized by a structured search protocol, it can be used to select publications and organize them through a taxonomy of NLP tasks. We include a classification of Language Resources (LRs), according to their availability, and datasets, according to their annotation, to highlight publicly-available and machine-actionable LRs. By applying our method, we conducted a systematic literature review of Greek NLP from 2012 to 2022, providing a comprehensive overview of the current state and challenges of Greek NLP research. We discuss the progress of Greek NLP and outline encountered Greek LRs, classified by availability and usability. As we show, our proposed method helps avoid common pitfalls, such as data leakage and contamination, and to assess language support per NLP task. We consider this systematic literature review of Greek NLP an application of our method that showcases the benefits of a monolingual NLP survey. Similar applications could be regard the myriads of languages whose progress in NLP lags behind that of well-supported languages.
Related papers
- A survey of neural-network-based methods utilising comparable data for finding translation equivalents [0.0]
We present the most common approaches from NLP that endeavour to automatically induce one of the essential dictionary components.
We analyse them from a lexicographic perspective since their viewpoints are crucial for improving the described methods.
This survey encourages a connection between the NLP and lexicography fields as the NLP field can benefit from lexicographic insights.
arXiv Detail & Related papers (2024-10-19T16:10:41Z) - The Nature of NLP: Analyzing Contributions in NLP Papers [77.31665252336157]
We quantitatively investigate what constitutes NLP research by examining research papers.
Our findings reveal a rising involvement of machine learning in NLP since the early nineties.
In post-2020, there has been a resurgence of focus on language and people.
arXiv Detail & Related papers (2024-09-29T01:29:28Z) - The Ghanaian NLP Landscape: A First Look [9.17372840572907]
Ghanaian languages, in particular, face an alarming decline, with documented extinction and several at risk.
This study pioneers a comprehensive survey of Natural Language Processing (NLP) research focused on Ghanaian languages.
arXiv Detail & Related papers (2024-05-10T21:39:09Z) - Multilingual Large Language Model: A Survey of Resources, Taxonomy and Frontiers [81.47046536073682]
We present a review and provide a unified perspective to summarize the recent progress as well as emerging trends in multilingual large language models (MLLMs) literature.
We hope our work can provide the community with quick access and spur breakthrough research in MLLMs.
arXiv Detail & Related papers (2024-04-07T11:52:44Z) - Natural Language Processing for Dialects of a Language: A Survey [56.93337350526933]
State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets.
This survey delves into an important attribute of these datasets: the dialect of a language.
Motivated by the performance degradation of NLP models for dialectic datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches.
arXiv Detail & Related papers (2024-01-11T03:04:38Z) - Ling-CL: Understanding NLP Models through Linguistic Curricula [17.44112549879293]
We employ a characterization of linguistic complexity from psycholinguistic and language acquisition research.
We develop data-driven curricula to understand the underlying linguistic knowledge that models learn to address NLP tasks.
arXiv Detail & Related papers (2023-10-31T01:44:33Z) - Beyond Counting Datasets: A Survey of Multilingual Dataset Construction
and Necessary Resources [38.814057529254846]
We examine the characteristics of 156 publicly available NLP datasets.
We survey language-proficient NLP researchers and crowd workers per language.
We identify strategies for collecting high-quality multilingual data on the Mechanical Turk platform.
arXiv Detail & Related papers (2022-11-28T18:54:33Z) - Meta Learning for Natural Language Processing: A Survey [88.58260839196019]
Deep learning has been the mainstream technique in natural language processing (NLP) area.
Deep learning requires many labeled data and is less generalizable across domains.
Meta-learning is an arising field in machine learning studying approaches to learn better algorithms.
arXiv Detail & Related papers (2022-05-03T13:58:38Z) - Systematic Inequalities in Language Technology Performance across the
World's Languages [94.65681336393425]
We introduce a framework for estimating the global utility of language technologies.
Our analyses involve the field at large, but also more in-depth studies on both user-facing technologies and more linguistic NLP tasks.
arXiv Detail & Related papers (2021-10-13T14:03:07Z) - Ensuring the Inclusive Use of Natural Language Processing in the Global
Response to COVID-19 [58.720142291102135]
We discuss ways in which current and future NLP approaches can be made more inclusive by covering low-resource languages.
We suggest several future directions for researchers interested in maximizing the positive societal impacts of NLP.
arXiv Detail & Related papers (2021-08-11T12:54:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.