AI Diffusion in Low Resource Language Countries
- URL: http://arxiv.org/abs/2511.02752v1
- Date: Tue, 04 Nov 2025 17:31:39 GMT
- Title: AI Diffusion in Low Resource Language Countries
- Authors: Amit Misra, Syed Waqas Zamir, Wassim Hamidouche, Inbal Becker-Reshef, Juan Lavista Ferres,
- Abstract summary: Low-Resource Language Countries (LRLCs) have a share of AI users that is approximately 20% lower relative to their baseline.<n> linguistic accessibility is a significant, independent barrier to equitable AI diffusion.
- Score: 10.939989206795047
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Artificial intelligence (AI) is diffusing globally at unprecedented speed, but adoption remains uneven. Frontier Large Language Models (LLMs) are known to perform poorly on low-resource languages due to data scarcity. We hypothesize that this performance deficit reduces the utility of AI, thereby slowing adoption in Low-Resource Language Countries (LRLCs). To test this, we use a weighted regression model to isolate the language effect from socioeconomic and demographic factors, finding that LRLCs have a share of AI users that is approximately 20% lower relative to their baseline. These results indicate that linguistic accessibility is a significant, independent barrier to equitable AI diffusion.
Related papers
- Efficient ASR for Low-Resource Languages: Leveraging Cross-Lingual Unlabeled Data [5.324230283177818]
We present a systematic investigation into cross-lingual continuous pretraining for low-resource languages.<n>We construct a 3,000-hour multilingual corpus through a scalable unlabeled data collection pipeline.<n>We employ targeted continual pretraining combined with morphologically-aware tokenization to develop a 300M parameter model that achieves performance comparable to systems 5 times larger.
arXiv Detail & Related papers (2025-12-08T08:16:34Z) - AI-Assisted Writing Is Growing Fastest Among Non-English-Speaking and Less Established Scientists [2.9557942678513007]
We analyze over two million full-text biomedical publications from PubMed Central from 2021 to 2024.<n>We observe a significant post-ChatGPT surge in AI-assisted writing, with adoption growing fastest in contexts where language barriers are most pronounced.<n>Increased AI usage was associated with a modest increase in productivity, narrowing the publication gap between scientists from English-speaking and non-English-speaking countries.
arXiv Detail & Related papers (2025-11-19T21:00:18Z) - When Less Language is More: Language-Reasoning Disentanglement Makes LLMs Better Multilingual Reasoners [111.50503126693444]
We show that language-specific ablation consistently boosts multilingual reasoning performance.<n>Compared to post-training, our training-free ablation achieves comparable or superior results with minimal computational overhead.
arXiv Detail & Related papers (2025-05-21T08:35:05Z) - Enhancing Code Generation for Low-Resource Languages: No Silver Bullet [55.39571645315926]
Large Language Models (LLMs) rely on large and diverse datasets to learn syntax, semantics, and usage patterns of programming languages.<n>For low-resource languages, the limited availability of such data hampers the models' ability to generalize effectively.<n>We present an empirical study investigating the effectiveness of several approaches for boosting LLMs' performance on low-resource languages.
arXiv Detail & Related papers (2025-01-31T12:23:28Z) - Think Outside the Data: Colonial Biases and Systemic Issues in Automated Moderation Pipelines for Low-Resource Languages [13.011117871938561]
AI-driven moderation systems struggle with low-resource languages spoken in the Global South.<n>Our findings reveal that beyond data scarcity, socio-political factors such as tech companies' monopoly on user data exacerbate historic inequities.<n>We argue these limitations are not just technical gaps caused by "data scarcity" but reflect structural inequities, rooted in colonial suppression of non-Western languages.
arXiv Detail & Related papers (2025-01-23T17:01:53Z) - Building an Efficient Multilingual Non-Profit IR System for the Islamic Domain Leveraging Multiprocessing Design in Rust [0.0]
This work focuses on the development of a multilingual non-profit IR system for the Islamic domain.
By employing methods like continued pre-training for domain adaptation and language reduction to decrease model size, a lightweight multilingual retrieval model was prepared.
arXiv Detail & Related papers (2024-11-09T11:37:18Z) - Double Jeopardy and Climate Impact in the Use of Large Language Models: Socio-economic Disparities and Reduced Utility for Non-English Speakers [0.0]
We show that English speakers face higher costs when using OpenAI's GPT models via APIs because of how the system processes the input -- tokenization.
Around 1.5 billion people, speaking languages primarily from lower-middle-income countries, could incur costs that are 4 to 6 times higher than those faced by English speakers.
This underscores the need for fairer algorithm development to benefit all linguistic groups.
arXiv Detail & Related papers (2024-10-14T16:11:04Z) - SMILE: Speech Meta In-Context Learning for Low-Resource Language Automatic Speech Recognition [55.2480439325792]
Speech Meta In-Context LEarning (SMILE) is an innovative framework that combines meta-learning with speech in-context learning (SICL)<n>We show that SMILE consistently outperforms baseline methods in training-free few-shot multilingual ASR tasks.
arXiv Detail & Related papers (2024-09-16T16:04:16Z) - Unlocking the Potential of Model Merging for Low-Resource Languages [66.7716891808697]
Adapting large language models to new languages typically involves continual pre-training (CT) followed by supervised fine-tuning (SFT)
We propose model merging as an alternative for low-resource languages, combining models with distinct capabilities into a single model without additional training.
Experiments based on Llama-2-7B demonstrate that model merging effectively endows LLMs for low-resource languages with task-solving abilities, outperforming CT-then-SFT in scenarios with extremely scarce data.
arXiv Detail & Related papers (2024-07-04T15:14:17Z) - LLMs in the Loop: Leveraging Large Language Model Annotations for Active Learning in Low-Resource Languages [1.149936119867417]
Low-resource languages face significant barriers in AI development due to limited linguistic resources and expertise for data labeling.
We propose leveraging the potential of LLMs in the active learning loop for data annotation.
Empirical evaluations, notably employing GPT-4-Turbo, demonstrate near-state-of-the-art performance with significantly reduced data requirements.
arXiv Detail & Related papers (2024-04-02T19:34:22Z) - No Language Left Behind: Scaling Human-Centered Machine Translation [69.28110770760506]
We create datasets and models aimed at narrowing the performance gap between low and high-resource languages.
We propose multiple architectural and training improvements to counteract overfitting while training on thousands of tasks.
Our model achieves an improvement of 44% BLEU relative to the previous state-of-the-art.
arXiv Detail & Related papers (2022-07-11T07:33:36Z) - Improving Candidate Generation for Low-resource Cross-lingual Entity
Linking [81.41804263432684]
Cross-lingual entity linking (XEL) is the task of finding referents in a target-language knowledge base (KB) for mentions extracted from source-language texts.
In this paper, we propose three improvements that (1) reduce the disconnect between entity mentions and KB entries, and (2) improve the robustness of the model to low-resource scenarios.
arXiv Detail & Related papers (2020-03-03T05:32:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.