Artificial intelligence is creating a new global linguistic hierarchy
- URL: http://arxiv.org/abs/2602.12018v1
- Date: Thu, 12 Feb 2026 14:50:44 GMT
- Title: Artificial intelligence is creating a new global linguistic hierarchy
- Authors: Giulia Occhini, Kumiko Tanaka-Ishii, Anna Barford, Refael Tikochinski, Songbo Hu, Roi Reichart, Yijie Zhou, Hannah Claus, Ulla Petti, Ivan Vulić, Ramit Debnath, Anna Korhonen,
- Abstract summary: We present a global longitudinal analysis of social, economic and infrastructural conditions across languages.<n>We find that despite efforts to broaden the reach of language technologies, the dominance of a handful of languages is exacerbating disparities.<n>We introduce the Language AI Readiness Index (EQUATE), which maps the state of technological, socio-economic, and infrastructural prerequisites for AI deployment across languages.
- Score: 34.80252741178931
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Artificial intelligence (AI) has the potential to transform healthcare, education, governance and socioeconomic equity, but its benefits remain concentrated in a small number of languages (Bender, 2019; Blasi et al., 2022; Joshi et al., 2020; Ranathunga and de Silva, 2022; Young, 2015). Language AI - the technologies that underpin widely-used conversational systems such as ChatGPT - could provide major benefits if available in people's native languages, yet most of the world's 7,000+ linguistic communities currently lack access and face persistent digital marginalization. Here we present a global longitudinal analysis of social, economic and infrastructural conditions across languages to assess systemic inequalities in language AI. We first analyze the existence of AI resources for 6003 languages. We find that despite efforts of the community to broaden the reach of language technologies (Bapna et al., 2022; Costa-Jussà et al., 2022), the dominance of a handful of languages is exacerbating disparities on an unprecedented scale, with divides widening exponentially rather than narrowing. Further, we contrast the longitudinal diffusion of AI with that of earlier IT technologies, revealing a distinctive hype-driven pattern of spread. To translate our findings into practical insights and guide prioritization efforts, we introduce the Language AI Readiness Index (EQUATE), which maps the state of technological, socio-economic, and infrastructural prerequisites for AI deployment across languages. The index highlights communities where capacity exists but remains underutilized, and provides a framework for accelerating more equitable diffusion of language AI. Our work contributes to setting the baseline for a transition towards more sustainable and equitable language technologies.
Related papers
- AI+HW 2035: Shaping the Next Decade [135.53570243498987]
Artificial intelligence (AI) and hardware (HW) are advancing at unprecedented rates, yet their trajectories have become inseparably intertwined.<n>This vision paper lays out a 10-year roadmap for AI+HW co-design and co-development, spanning algorithms, architectures, systems, and sustainability.<n>We identify key challenges and opportunities, candidly assess potential obstacles and pitfalls, and propose integrated solutions.
arXiv Detail & Related papers (2026-03-05T14:36:33Z) - Invisible Languages of the LLM Universe [1.2891210250935148]
2,000 languages with millions of speakers remain effectively invisible in digital ecosystems.<n>Our analysis reveals that English dominance in AI is not a technical necessity but an artifact of power structures that systematically exclude marginalized linguistic knowledge.
arXiv Detail & Related papers (2025-10-13T16:00:15Z) - Deaf in AI: AI language technologies and the erosion of linguistic rights [0.0]
This paper explores the interplay of AI language technologies, sign language interpreting, and linguistic access.<n>It calls for deaf-led approaches to foster AI systems that remain equitable, inclusive, and trustworthy.
arXiv Detail & Related papers (2025-05-05T09:58:59Z) - Generative AI, Pragmatics, and Authenticity in Second Language Learning [0.0]
There are obvious benefits to integrating generative AI (artificial intelligence) into language learning and teaching.
However, due to how AI systems under-stand human language, they lack the lived experience to be able to use language with the same social awareness as humans.
There are built-in linguistic and cultural biases based on their training data which is mostly in English and predominantly from Western sources.
arXiv Detail & Related papers (2024-10-18T11:58:03Z) - Harnessing the Power of Artificial Intelligence to Vitalize Endangered Indigenous Languages: Technologies and Experiences [31.62071644137294]
We discuss the decreasing diversity of languages in the world and how working with Indigenous languages poses unique ethical challenges for AI and NLP.
We report encouraging results in the development of high-quality machine learning translators for Indigenous languages.
We present prototypes we have built in projects done in 2023 and 2024 with Indigenous communities in Brazil, aimed at facilitating writing.
arXiv Detail & Related papers (2024-07-17T14:46:37Z) - Advancing Social Intelligence in AI Agents: Technical Challenges and Open Questions [67.60397632819202]
Building socially-intelligent AI agents (Social-AI) is a multidisciplinary, multimodal research goal.
We identify a set of underlying technical challenges and open questions for researchers across computing communities to advance Social-AI.
arXiv Detail & Related papers (2024-04-17T02:57:42Z) - Distributed agency in second language learning and teaching through generative AI [0.0]
ChatGPT can provide informal second language practice through chats in written or voice forms.
Instructors can use AI to build learning and assessment materials in a variety of media.
arXiv Detail & Related papers (2024-03-29T14:55:40Z) - Social Intelligence Data Infrastructure: Structuring the Present and Navigating the Future [59.78608958395464]
We build a Social AI Data Infrastructure, which consists of a comprehensive social AI taxonomy and a data library of 480 NLP datasets.
Our infrastructure allows us to analyze existing dataset efforts, and also evaluate language models' performance in different social intelligence aspects.
We show there is a need for multifaceted datasets, increased diversity in language and culture, more long-tailed social situations, and more interactive data in future social intelligence data efforts.
arXiv Detail & Related papers (2024-02-28T00:22:42Z) - Understanding Natural Language Understanding Systems. A Critical
Analysis [91.81211519327161]
The development of machines that guillemotlefttalk like usguillemotright, also known as Natural Language Understanding (NLU) systems, is the Holy Grail of Artificial Intelligence (AI)
But never has the trust that we can build guillemotlefttalking machinesguillemotright been stronger than the one engendered by the last generation of NLU systems.
Are we at the dawn of a new era, in which the Grail is finally closer to us?
arXiv Detail & Related papers (2023-03-01T08:32:55Z) - Systematic Inequalities in Language Technology Performance across the
World's Languages [94.65681336393425]
We introduce a framework for estimating the global utility of language technologies.
Our analyses involve the field at large, but also more in-depth studies on both user-facing technologies and more linguistic NLP tasks.
arXiv Detail & Related papers (2021-10-13T14:03:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.