Related papers: Diversidade linguística e inclusão digital: desafios para uma ia brasileira

Related papers

Losing our Tail -- Again: On (Un)Natural Selection And Multilingual Large Language Models [0.8702432681310399]
I argue that the tails of our linguistic distributions are vanishing, and with them, the narratives and identities they carry.<n>This is a call to resist linguistic flattening and to reimagine NLP as a field that encourages, values and protects expressive multilingual lexical and linguistic diversity and creativity.
arXiv Detail & Related papers (2025-07-05T07:36:49Z)
Generative AI and Large Language Models in Language Preservation: Opportunities and Challenges [0.0]
Generative AI and large-scale language models (LLM) have emerged as powerful tools in language preservation. This paper examines the role of generative AIs and LLMs in preserving endangered languages, highlighting the risks and challenges associated with their use.
arXiv Detail & Related papers (2025-01-20T14:03:40Z)
Benchmarking Linguistic Diversity of Large Language Models [14.824871604671467]
This paper emphasizes the importance of examining the preservation of human linguistic richness by language models. We propose a comprehensive framework for evaluating LLMs from various linguistic diversity perspectives.
arXiv Detail & Related papers (2024-12-13T16:46:03Z)
LIMBA: An Open-Source Framework for the Preservation and Valorization of Low-Resource Languages using Generative Models [62.47865866398233]
This white paper proposes a framework to generate linguistic tools for low-resource languages. By addressing the data scarcity that hinders intelligent applications for such languages, we contribute to promoting linguistic diversity.
arXiv Detail & Related papers (2024-11-20T16:59:41Z)
Standard Language Ideology in AI-Generated Language [1.2815904071470705]
We explore standard language ideology in language generated by large language models (LLMs) We introduce the concept of standard AI-generated language ideology, the process by which AI-generated language regards Standard American English (SAE) as a linguistic default and reinforces a linguistic bias that SAE is the most "appropriate" language.
arXiv Detail & Related papers (2024-06-13T01:08:40Z)
What is "Typological Diversity" in NLP? [7.58293347591642]
We introduce metrics to approximate the diversity of language selection along several axes. We show that skewed language selection can lead to overestimated multilingual performance.
arXiv Detail & Related papers (2024-02-06T18:29:39Z)
Multilingual Text-to-Image Generation Magnifies Gender Stereotypes and Prompt Engineering May Not Help You [64.74707085021858]
We show that multilingual models suffer from significant gender biases just as monolingual models do. We propose a novel benchmark, MAGBIG, intended to foster research on gender bias in multilingual models. Our results show that not only do models exhibit strong gender biases but they also behave differently across languages.
arXiv Detail & Related papers (2024-01-29T12:02:28Z)
Diversity and Language Technology: How Techno-Linguistic Bias Can Cause Epistemic Injustice [4.234367850767171]
We show that many attempts produce flawed solutions that adhere to a hard-wired representational preference for certain languages. As we show through the paper, techno-linguistic bias can result in systems that can only express concepts that are part of the language and culture of dominant powers. We argue that at the root of this problem lies a systematic tendency of technology developer communities to apply a simplistic understanding of diversity.
arXiv Detail & Related papers (2023-07-25T16:08:27Z)
Towards Bridging the Digital Language Divide [4.234367850767171]
multilingual language processing systems often exhibit a hardwired, yet usually involuntary and hidden representational preference towards certain languages. We show that biased technology is often the result of research and development methodologies that do not do justice to the complexity of the languages being represented. We present a new initiative that aims at reducing linguistic bias through both technological design and methodology.
arXiv Detail & Related papers (2023-07-25T10:53:20Z)
Democratizing Ethical Assessment of Natural Language Generation Models [0.0]
Natural language generation models are computer systems that generate coherent language when prompted with a sequence of words as context. Despite their ubiquity and many beneficial applications, language generation models also have the potential to inflict social harms. Ethical assessment of these models is therefore critical. This article introduces a new tool to democratize and standardize ethical assessment of natural language generation models.
arXiv Detail & Related papers (2022-06-30T12:20:31Z)
Discovering Representation Sprachbund For Multilingual Pre-Training [139.05668687865688]
We generate language representation from multilingual pre-trained models and conduct linguistic analysis. We cluster all the target languages into multiple groups and name each group as a representation sprachbund. Experiments are conducted on cross-lingual benchmarks and significant improvements are achieved compared to strong baselines.
arXiv Detail & Related papers (2021-09-01T09:32:06Z)
Crossing the Conversational Chasm: A Primer on Multilingual Task-Oriented Dialogue Systems [51.328224222640614]
Current state-of-the-art ToD models based on large pretrained neural language models are data hungry. Data acquisition for ToD use cases is expensive and tedious.
arXiv Detail & Related papers (2021-04-17T15:19:56Z)
Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer [101.58431011820755]
We study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications. We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations.
arXiv Detail & Related papers (2020-05-02T04:34:37Z)
Bridging Linguistic Typology and Multilingual Machine Translation with Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source. We observe that our representations embed typology and strengthen correlations with language relationships. We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.