Evaluating Language Tools for Fifteen EU-official Under-resourced
Languages
- URL: http://arxiv.org/abs/2010.12428v1
- Date: Fri, 23 Oct 2020 14:21:03 GMT
- Title: Evaluating Language Tools for Fifteen EU-official Under-resourced
Languages
- Authors: Diego Alves, Gaurish Thakkar, Marko Tadi\'c
- Abstract summary: This article presents the results of the evaluation campaign of language tools available for fifteen EU-official under-resourced languages.
The evaluation was conducted within the MSC ITN CLEOPATRA action that aims at building the cross-lingual event-centric knowledge processing.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This article presents the results of the evaluation campaign of language
tools available for fifteen EU-official under-resourced languages. The
evaluation was conducted within the MSC ITN CLEOPATRA action that aims at
building the cross-lingual event-centric knowledge processing on top of the
application of linguistic processing chains (LPCs) for at least 24 EU-official
languages. In this campaign, we concentrated on three existing NLP platforms
(Stanford CoreNLP, NLP Cube, UDPipe) that all provide models for
under-resourced languages and in this first run we covered 15 under-resourced
languages for which the models were available. We present the design of the
evaluation campaign and present the results as well as discuss them. We
considered the difference between reported and our tested results within a
single percentage point as being within the limits of acceptable tolerance and
thus consider this result as reproducible. However, for a number of languages,
the results are below what was reported in the literature, and in some cases,
our testing results are even better than the ones reported previously.
Particularly problematic was the evaluation of NERC systems. One of the reasons
is the absence of universally or cross-lingually applicable named entities
classification scheme that would serve the NERC task in different languages
analogous to the Universal Dependency scheme in parsing task. To build such a
scheme has become one of our the future research directions.
Related papers
- Quantifying Multilingual Performance of Large Language Models Across Languages [48.40607157158246]
Large Language Models (LLMs) perform better on high-resource languages like English, German, and French, while their capabilities in low-resource languages remain inadequate.
We propose the Language Ranker, an intrinsic metric designed to benchmark and rank languages based on LLM performance using internal representations.
Our analysis reveals that high-resource languages exhibit higher similarity scores with English, demonstrating superior performance, while low-resource languages show lower similarity scores.
arXiv Detail & Related papers (2024-04-17T16:53:16Z) - DIALECTBENCH: A NLP Benchmark for Dialects, Varieties, and Closely-Related Languages [49.38663048447942]
We propose DIALECTBENCH, the first-ever large-scale benchmark for NLP on varieties.
This allows for a comprehensive evaluation of NLP system performance on different language varieties.
We provide substantial evidence of performance disparities between standard and non-standard language varieties.
arXiv Detail & Related papers (2024-03-16T20:18:36Z) - High-quality Data-to-Text Generation for Severely Under-Resourced
Languages with Out-of-the-box Large Language Models [5.632410663467911]
We explore the extent to which pretrained large language models (LLMs) can bridge the performance gap for under-resourced languages.
We find that LLMs easily set the state of the art for the under-resourced languages by substantial margins.
For all our languages, human evaluation shows on-a-par performance with humans for our best systems, but BLEU scores collapse compared to English.
arXiv Detail & Related papers (2024-02-19T16:29:40Z) - Natural Language Processing for Dialects of a Language: A Survey [56.93337350526933]
State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets.
This survey delves into an important attribute of these datasets: the dialect of a language.
Motivated by the performance degradation of NLP models for dialectic datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches.
arXiv Detail & Related papers (2024-01-11T03:04:38Z) - Quantifying the Dialect Gap and its Correlates Across Languages [69.18461982439031]
This work will lay the foundation for furthering the field of dialectal NLP by laying out evident disparities and identifying possible pathways for addressing them through mindful data collection.
arXiv Detail & Related papers (2023-10-23T17:42:01Z) - NusaWrites: Constructing High-Quality Corpora for Underrepresented and
Extremely Low-Resource Languages [54.808217147579036]
We conduct a case study on Indonesian local languages.
We compare the effectiveness of online scraping, human translation, and paragraph writing by native speakers in constructing datasets.
Our findings demonstrate that datasets generated through paragraph writing by native speakers exhibit superior quality in terms of lexical diversity and cultural content.
arXiv Detail & Related papers (2023-09-19T14:42:33Z) - A Multilingual Perspective Towards the Evaluation of Attribution Methods
in Natural Language Inference [28.949004915740776]
We present a multilingual approach for evaluating attribution methods for the Natural Language Inference (NLI) task in terms of faithfulness and plausibility.
First, we introduce a novel cross-lingual strategy to measure faithfulness based on word alignments, which eliminates the drawbacks of erasure-based evaluations.
We then perform a comprehensive evaluation of attribution methods, considering different output mechanisms and aggregation methods.
arXiv Detail & Related papers (2022-04-11T22:11:05Z) - Monolingual and Cross-Lingual Acceptability Judgments with the Italian
CoLA corpus [2.418273287232718]
We describe the ItaCoLA corpus, containing almost 10,000 sentences with acceptability judgments.
We also present the first cross-lingual experiments, aimed at assessing whether multilingual transformerbased approaches can benefit from using sentences in two languages during fine-tuning.
arXiv Detail & Related papers (2021-09-24T16:18:53Z) - AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages
with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context.
It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts.
Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.