Language Varieties of Italy: Technology Challenges and Opportunities
- URL: http://arxiv.org/abs/2209.09757v2
- Date: Mon, 20 Nov 2023 16:54:55 GMT
- Title: Language Varieties of Italy: Technology Challenges and Opportunities
- Authors: Alan Ramponi
- Abstract summary: Italy is characterized by a one-of-a-kind linguistic diversity landscape in Europe.
Most local languages and dialects in Italy are at risk of disappearing within few generations.
- Score: 4.199528104335137
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Italy is characterized by a one-of-a-kind linguistic diversity landscape in
Europe, which implicitly encodes local knowledge, cultural traditions, artistic
expressions and history of its speakers. However, most local languages and
dialects in Italy are at risk of disappearing within few generations. The NLP
community has recently begun to engage with endangered languages, including
those of Italy. Yet, most efforts assume that these varieties are
under-resourced language monoliths with an established written form and
homogeneous functions and needs, and thus highly interchangeable with each
other and with high-resource, standardized languages. In this paper, we
introduce the linguistic context of Italy and challenge the default
machine-centric assumptions of NLP for Italy's language varieties. We advocate
for a shift in the paradigm from machine-centric to speaker-centric NLP, and
provide recommendations and opportunities for work that prioritizes languages
and their speakers over technological advances. To facilitate the process, we
finally propose building a local community towards responsible, participatory
efforts aimed at supporting vitality of languages and dialects of Italy.
Related papers
- Lens: Rethinking Multilingual Enhancement for Large Language Models [70.85065197789639]
Lens is a novel approach to enhance multilingual capabilities of large language models (LLMs)
It operates by manipulating the hidden representations within the language-agnostic and language-specific subspaces from top layers of LLMs.
It achieves superior results with much fewer computational resources compared to existing post-training approaches.
arXiv Detail & Related papers (2024-10-06T08:51:30Z) - NusaBERT: Teaching IndoBERT to be Multilingual and Multicultural [0.0]
NusaBERT builds upon IndoBERT by incorporating vocabulary expansion and leveraging a diverse multilingual corpus that includes regional languages and dialects.
Through rigorous evaluation across a range of benchmarks, NusaBERT demonstrates state-of-the-art performance in tasks involving multiple languages of Indonesia.
arXiv Detail & Related papers (2024-03-04T08:05:34Z) - What Do Dialect Speakers Want? A Survey of Attitudes Towards Language Technology for German Dialects [60.8361859783634]
We survey speakers of dialects and regional languages related to German.
We find that respondents are especially in favour of potential NLP tools that work with dialectal input.
arXiv Detail & Related papers (2024-02-19T09:15:28Z) - LLaMAntino: LLaMA 2 Models for Effective Text Generation in Italian
Language [7.214355350362308]
The LLaMA (Large Language Model Meta AI) family represents a novel advancement in the field of natural language processing.
This study contributes to Language Adaptation strategies for the Italian language by introducing the novel LLaMA family of Italian LLMs.
arXiv Detail & Related papers (2023-12-15T18:06:22Z) - Task-Agnostic Low-Rank Adapters for Unseen English Dialects [52.88554155235167]
Large Language Models (LLMs) are trained on corpora disproportionally weighted in favor of Standard American English.
By disentangling dialect-specific and cross-dialectal information, HyperLoRA improves generalization to unseen dialects in a task-agnostic fashion.
arXiv Detail & Related papers (2023-11-02T01:17:29Z) - Towards Bridging the Digital Language Divide [4.234367850767171]
multilingual language processing systems often exhibit a hardwired, yet usually involuntary and hidden representational preference towards certain languages.
We show that biased technology is often the result of research and development methodologies that do not do justice to the complexity of the languages being represented.
We present a new initiative that aims at reducing linguistic bias through both technological design and methodology.
arXiv Detail & Related papers (2023-07-25T10:53:20Z) - Neural Machine Translation for the Indigenous Languages of the Americas:
An Introduction [102.13536517783837]
Most languages from the Americas are among them, having a limited amount of parallel and monolingual data, if any.
We discuss the recent advances and findings and open questions, product of an increased interest of the NLP community in these languages.
arXiv Detail & Related papers (2023-06-11T23:27:47Z) - Transfer to a Low-Resource Language via Close Relatives: The Case Study
on Faroese [54.00582760714034]
Cross-lingual NLP transfer can be improved by exploiting data and models of high-resource languages.
We release a new web corpus of Faroese and Faroese datasets for named entity recognition (NER), semantic text similarity (STS) and new language models trained on all Scandinavian languages.
arXiv Detail & Related papers (2023-04-18T08:42:38Z) - Not always about you: Prioritizing community needs when developing
endangered language technology [5.670857685983896]
We discuss the unique technological, cultural, practical, and ethical challenges that researchers and indigenous speech community members face.
We report the perspectives of language teachers, Master Speakers and elders from indigenous communities, as well as the point of view of academics.
arXiv Detail & Related papers (2022-04-12T05:59:39Z) - Crossing the Conversational Chasm: A Primer on Multilingual
Task-Oriented Dialogue Systems [51.328224222640614]
Current state-of-the-art ToD models based on large pretrained neural language models are data hungry.
Data acquisition for ToD use cases is expensive and tedious.
arXiv Detail & Related papers (2021-04-17T15:19:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.