Related papers: Towards Systematic Monolingual NLP Surveys: GenA of Greek NLP

Towards Systematic Monolingual NLP Surveys: GenA of Greek NLP

URL: http://arxiv.org/abs/2407.09861v1
Date: Sat, 13 Jul 2024 12:01:52 GMT
Title: Towards Systematic Monolingual NLP Surveys: GenA of Greek NLP
Authors: Juli Bakagianni, Kanella Pouli, Maria Gavriilidou, John Pavlopoulos,
Abstract summary: This study fills the gap by introducing a method for creating systematic and comprehensive monolingual NLP surveys. Characterized by a structured search protocol, it can be used to select publications and organize them through a taxonomy of NLP tasks. By applying our method, we conducted a systematic literature review of Greek NLP from 2012 to 2022.
Score: 2.3499129784547663
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Natural Language Processing (NLP) research has traditionally been predominantly focused on English, driven by the availability of resources, the size of the research community, and market demands. Recently, there has been a noticeable shift towards multilingualism in NLP, recognizing the need for inclusivity and effectiveness across diverse languages and cultures. Monolingual surveys have the potential to complement the broader trend towards multilingualism in NLP by providing foundational insights and resources necessary for effectively addressing the linguistic diversity of global communication. However, monolingual NLP surveys are extremely rare in literature. This study fills the gap by introducing a method for creating systematic and comprehensive monolingual NLP surveys. Characterized by a structured search protocol, it can be used to select publications and organize them through a taxonomy of NLP tasks. We include a classification of Language Resources (LRs), according to their availability, and datasets, according to their annotation, to highlight publicly-available and machine-actionable LRs. By applying our method, we conducted a systematic literature review of Greek NLP from 2012 to 2022, providing a comprehensive overview of the current state and challenges of Greek NLP research. We discuss the progress of Greek NLP and outline encountered Greek LRs, classified by availability and usability. As we show, our proposed method helps avoid common pitfalls, such as data leakage and contamination, and to assess language support per NLP task. We consider this systematic literature review of Greek NLP an application of our method that showcases the benefits of a monolingual NLP survey. Similar applications could be regard the myriads of languages whose progress in NLP lags behind that of well-supported languages.

Related papers

Opportunities and Challenges of Natural Language Processing for Low-Resource Senegalese Languages in Social Science Research [0.6016863427924156]
This paper provides the first comprehensive overview of progress and challenges for the six national languages officially recognized by the Senegalese Constitution: Wolof, Pulaar, Sereer, Joola, Mandingue, and Soninke.<n>We synthesize linguistic, sociotechnical, and infrastructural factors that shape their digital readiness and identify gaps in data, tools, and benchmarks.<n>The paper concludes by outlining a roadmap toward sustainable, community-centered NLP ecosystems for Senegalese languages.
arXiv Detail & Related papers (2025-12-24T20:20:31Z)
Dancing with Critiques: Enhancing LLM Reasoning with Stepwise Natural Language Self-Critique [66.94905631175209]
We propose a novel inference-time scaling approach -- stepwise natural language self-critique (PANEL)<n>It employs self-generated natural language critiques as feedback to guide the step-level search process.<n>This approach bypasses the need for task-specific verifiers and the associated training overhead.
arXiv Detail & Related papers (2025-03-21T17:59:55Z)
NaijaNLP: A Survey of Nigerian Low-Resource Languages [0.0]
Three languages -- Hausa, Yorub'a and Igbo -- account for about 60% of the spoken languages in Nigeria. These languages are categorised as low-resource due to insufficient resources to support tasks in computational linguistics. This study presents the first comprehensive review of advancements in low-resource NLP (LR-NLP) research across the three major Nigerian languages.
arXiv Detail & Related papers (2025-02-27T05:48:51Z)
Bridging Gaps in Natural Language Processing for Yorùbá: A Systematic Review of a Decade of Progress and Prospects [0.6554326244334868]
This review highlights the scarcity of annotated corpora, limited availability of pre-trained language models, and linguistic challenges like tonal complexity and diacritic dependency as significant obstacles. The findings reveal a growing body of multilingual and monolingual resources, even though the field is constrained by socio-cultural factors such as code-switching and desertion of language for digital usage.
arXiv Detail & Related papers (2025-02-24T17:41:48Z)
Open or Closed LLM for Lesser-Resourced Languages? Lessons from Greek [2.3499129784547663]
We evaluate the performance of open-source (Llama-70b) and closed-source (GPT-4o mini) large language models on seven core NLP tasks with dataset availability.<n>Second, we expand the scope of Greek NLP by reframing Authorship Attribution as a tool to assess potential data usage by LLMs in pre-training.<n>Third, we showcase a legal NLP case study, where a Summarize, Translate, and Embed (STE) methodology outperforms the traditional TF-IDF approach for clustering emphlong legal texts.
arXiv Detail & Related papers (2025-01-22T12:06:16Z)
A survey of neural-network-based methods utilising comparable data for finding translation equivalents [0.0]
We present the most common approaches from NLP that endeavour to automatically induce one of the essential dictionary components. We analyse them from a lexicographic perspective since their viewpoints are crucial for improving the described methods. This survey encourages a connection between the NLP and lexicography fields as the NLP field can benefit from lexicographic insights.
arXiv Detail & Related papers (2024-10-19T16:10:41Z)
The Nature of NLP: Analyzing Contributions in NLP Papers [77.31665252336157]
We quantitatively investigate what constitutes NLP research by examining research papers. Our findings reveal a rising involvement of machine learning in NLP since the early nineties. In post-2020, there has been a resurgence of focus on language and people.
arXiv Detail & Related papers (2024-09-29T01:29:28Z)
Systematic Task Exploration with LLMs: A Study in Citation Text Generation [63.50597360948099]
Large language models (LLMs) bring unprecedented flexibility in defining and executing complex, creative natural language generation (NLG) tasks. We propose a three-component research framework that consists of systematic input manipulation, reference data, and output measurement. We use this framework to explore citation text generation -- a popular scholarly NLP task that lacks consensus on the task definition and evaluation metric.
arXiv Detail & Related papers (2024-07-04T16:41:08Z)
A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers [51.8203871494146]
The rapid development of Large Language Models (LLMs) demonstrates remarkable multilingual capabilities in natural language processing. Despite the breakthroughs of LLMs, the investigation into the multilingual scenario remains insufficient. This survey aims to help the research community address multilingual problems and provide a comprehensive understanding of the core concepts, key techniques, and latest developments in multilingual natural language processing based on LLMs.
arXiv Detail & Related papers (2024-05-17T17:47:39Z)
The Ghanaian NLP Landscape: A First Look [9.17372840572907]
Ghanaian languages, in particular, face an alarming decline, with documented extinction and several at risk. This study pioneers a comprehensive survey of Natural Language Processing (NLP) research focused on Ghanaian languages.
arXiv Detail & Related papers (2024-05-10T21:39:09Z)
From Multiple-Choice to Extractive QA: A Case Study for English and Arabic [51.13706104333848]
We explore the feasibility of repurposing an existing multilingual dataset for a new NLP task.<n>We present annotation guidelines and a parallel EQA dataset for English and Modern Standard Arabic.<n>We aim to help others adapt our approach for the remaining 120 BELEBELE language variants, many of which are deemed under-resourced.
arXiv Detail & Related papers (2024-04-26T11:46:05Z)
Multilingual Large Language Model: A Survey of Resources, Taxonomy and Frontiers [81.47046536073682]
We present a review and provide a unified perspective to summarize the recent progress as well as emerging trends in multilingual large language models (MLLMs) literature. We hope our work can provide the community with quick access and spur breakthrough research in MLLMs.
arXiv Detail & Related papers (2024-04-07T11:52:44Z)
Natural Language Processing for Dialects of a Language: A Survey [56.93337350526933]
State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets. This survey delves into an important attribute of these datasets: the dialect of a language. Motivated by the performance degradation of NLP models for dialectic datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches.
arXiv Detail & Related papers (2024-01-11T03:04:38Z)
Ling-CL: Understanding NLP Models through Linguistic Curricula [17.44112549879293]
We employ a characterization of linguistic complexity from psycholinguistic and language acquisition research. We develop data-driven curricula to understand the underlying linguistic knowledge that models learn to address NLP tasks.
arXiv Detail & Related papers (2023-10-31T01:44:33Z)
Beyond Counting Datasets: A Survey of Multilingual Dataset Construction and Necessary Resources [38.814057529254846]
We examine the characteristics of 156 publicly available NLP datasets. We survey language-proficient NLP researchers and crowd workers per language. We identify strategies for collecting high-quality multilingual data on the Mechanical Turk platform.
arXiv Detail & Related papers (2022-11-28T18:54:33Z)
Meta Learning for Natural Language Processing: A Survey [88.58260839196019]
Deep learning has been the mainstream technique in natural language processing (NLP) area. Deep learning requires many labeled data and is less generalizable across domains. Meta-learning is an arising field in machine learning studying approaches to learn better algorithms.
arXiv Detail & Related papers (2022-05-03T13:58:38Z)
Systematic Inequalities in Language Technology Performance across the World's Languages [94.65681336393425]
We introduce a framework for estimating the global utility of language technologies. Our analyses involve the field at large, but also more in-depth studies on both user-facing technologies and more linguistic NLP tasks.
arXiv Detail & Related papers (2021-10-13T14:03:07Z)
Ensuring the Inclusive Use of Natural Language Processing in the Global Response to COVID-19 [58.720142291102135]
We discuss ways in which current and future NLP approaches can be made more inclusive by covering low-resource languages. We suggest several future directions for researchers interested in maximizing the positive societal impacts of NLP.
arXiv Detail & Related papers (2021-08-11T12:54:26Z)
Low-Resource Adaptation of Neural NLP Models [0.30458514384586405]
This thesis investigates methods for dealing with low-resource scenarios in information extraction and natural language understanding. We develop and adapt neural NLP models to explore a number of research questions concerning NLP tasks with minimal or no training data.
arXiv Detail & Related papers (2020-11-09T12:13:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.