Ling-CL: Understanding NLP Models through Linguistic Curricula
- URL: http://arxiv.org/abs/2310.20121v1
- Date: Tue, 31 Oct 2023 01:44:33 GMT
- Title: Ling-CL: Understanding NLP Models through Linguistic Curricula
- Authors: Mohamed Elgaar, Hadi Amiri
- Abstract summary: We employ a characterization of linguistic complexity from psycholinguistic and language acquisition research.
We develop data-driven curricula to understand the underlying linguistic knowledge that models learn to address NLP tasks.
- Score: 17.44112549879293
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We employ a characterization of linguistic complexity from psycholinguistic
and language acquisition research to develop data-driven curricula to
understand the underlying linguistic knowledge that models learn to address NLP
tasks. The novelty of our approach is in the development of linguistic
curricula derived from data, existing knowledge about linguistic complexity,
and model behavior during training. By analyzing several benchmark NLP
datasets, our curriculum learning approaches identify sets of linguistic
metrics (indices) that inform the challenges and reasoning required to address
each task. Our work will inform future research in all NLP areas, allowing
linguistic complexity to be considered early in the research and development
process. In addition, our work prompts an examination of gold standards and
fair evaluation in NLP.
Related papers
- Pragmatics in the Era of Large Language Models: A Survey on Datasets, Evaluation, Opportunities and Challenges [26.846627984383836]
We provide a review of resources designed for evaluating pragmatic capabilities in NLP.
We analyze task designs, data collection methods, evaluation approaches, and their relevance to real-world applications.
Our survey aims to clarify the landscape of pragmatic evaluation and guide the development of more comprehensive and targeted benchmarks.
arXiv Detail & Related papers (2025-02-17T23:31:38Z) - IOLBENCH: Benchmarking LLMs on Linguistic Reasoning [8.20398036986024]
We introduce IOLBENCH, a novel benchmark derived from International Linguistics Olympiad (IOL) problems.
This dataset encompasses diverse problems testing syntax, morphology, phonology, and semantics.
We find that even the most advanced models struggle to handle the intricacies of linguistic complexity.
arXiv Detail & Related papers (2025-01-08T03:15:10Z) - Towards Systematic Monolingual NLP Surveys: GenA of Greek NLP [2.3499129784547663]
This study introduces a generalizable methodology for creating systematic and comprehensive monolingual NLP surveys.
We apply this methodology to Greek NLP (2012-2023), providing a comprehensive overview of its current state and challenges.
arXiv Detail & Related papers (2024-07-13T12:01:52Z) - Systematic Task Exploration with LLMs: A Study in Citation Text Generation [63.50597360948099]
Large language models (LLMs) bring unprecedented flexibility in defining and executing complex, creative natural language generation (NLG) tasks.
We propose a three-component research framework that consists of systematic input manipulation, reference data, and output measurement.
We use this framework to explore citation text generation -- a popular scholarly NLP task that lacks consensus on the task definition and evaluation metric.
arXiv Detail & Related papers (2024-07-04T16:41:08Z) - A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers [51.8203871494146]
The rapid development of Large Language Models (LLMs) demonstrates remarkable multilingual capabilities in natural language processing.
Despite the breakthroughs of LLMs, the investigation into the multilingual scenario remains insufficient.
This survey aims to help the research community address multilingual problems and provide a comprehensive understanding of the core concepts, key techniques, and latest developments in multilingual natural language processing based on LLMs.
arXiv Detail & Related papers (2024-05-17T17:47:39Z) - Natural Language Processing for Dialects of a Language: A Survey [56.93337350526933]
State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets.
This survey delves into an important attribute of these datasets: the dialect of a language.
Motivated by the performance degradation of NLP models for dialectal datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches.
arXiv Detail & Related papers (2024-01-11T03:04:38Z) - Interactive Natural Language Processing [67.87925315773924]
Interactive Natural Language Processing (iNLP) has emerged as a novel paradigm within the field of NLP.
This paper offers a comprehensive survey of iNLP, starting by proposing a unified definition and framework of the concept.
arXiv Detail & Related papers (2023-05-22T17:18:29Z) - Prompting Language Models for Linguistic Structure [73.11488464916668]
We present a structured prompting approach for linguistic structured prediction tasks.
We evaluate this approach on part-of-speech tagging, named entity recognition, and sentence chunking.
We find that while PLMs contain significant prior knowledge of task labels due to task leakage into the pretraining corpus, structured prompting can also retrieve linguistic structure with arbitrary labels.
arXiv Detail & Related papers (2022-11-15T01:13:39Z) - Systematic Inequalities in Language Technology Performance across the
World's Languages [94.65681336393425]
We introduce a framework for estimating the global utility of language technologies.
Our analyses involve the field at large, but also more in-depth studies on both user-facing technologies and more linguistic NLP tasks.
arXiv Detail & Related papers (2021-10-13T14:03:07Z) - Low-Resource Adaptation of Neural NLP Models [0.30458514384586405]
This thesis investigates methods for dealing with low-resource scenarios in information extraction and natural language understanding.
We develop and adapt neural NLP models to explore a number of research questions concerning NLP tasks with minimal or no training data.
arXiv Detail & Related papers (2020-11-09T12:13:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.