Related papers: Opportunities and Challenges of Large Language Models for Low-Resource Languages in Humanities Research

Opportunities and Challenges of Large Language Models for Low-Resource Languages in Humanities Research

URL: http://arxiv.org/abs/2412.04497v2
Date: Mon, 09 Dec 2024 03:00:42 GMT
Title: Opportunities and Challenges of Large Language Models for Low-Resource Languages in Humanities Research
Authors: Tianyang Zhong, Zhenyuan Yang, Zhengliang Liu, Ruidong Zhang, Yiheng Liu, Haiyang Sun, Yi Pan, Yiwei Li, Yifan Zhou, Hanqi Jiang, Junhao Chen, Tianming Liu,
Abstract summary: Low-resource languages serve as invaluable repositories of human history, embodying cultural evolution and intellectual diversity.<n>Despite their significance, these languages face critical challenges, including data scarcity and technological limitations.<n>Recent advancements in large language models (LLMs) offer transformative opportunities for addressing these challenges.
Score: 23.773194690783512
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Low-resource languages serve as invaluable repositories of human history, embodying cultural evolution and intellectual diversity. Despite their significance, these languages face critical challenges, including data scarcity and technological limitations, which hinder their comprehensive study and preservation. Recent advancements in large language models (LLMs) offer transformative opportunities for addressing these challenges, enabling innovative methodologies in linguistic, historical, and cultural research. This study systematically evaluates the applications of LLMs in low-resource language research, encompassing linguistic variation, historical documentation, cultural expressions, and literary analysis. By analyzing technical frameworks, current methodologies, and ethical considerations, this paper identifies key challenges such as data accessibility, model adaptability, and cultural sensitivity. Given the cultural, historical, and linguistic richness inherent in low-resource languages, this work emphasizes interdisciplinary collaboration and the development of customized models as promising avenues for advancing research in this domain. By underscoring the potential of integrating artificial intelligence with the humanities to preserve and study humanity's linguistic and cultural heritage, this study fosters global efforts towards safeguarding intellectual diversity.

Related papers

Disentangling Language and Culture for Evaluating Multilingual Large Language Models [48.06219053598005]
This paper introduces a Dual Evaluation Framework to comprehensively assess the multilingual capabilities of LLMs.<n>By decomposing the evaluation along the dimensions of linguistic medium and cultural context, this framework enables a nuanced analysis of LLMs' ability to process questions cross-lingually.
arXiv Detail & Related papers (2025-05-30T14:25:45Z)
Automatic Speech Recognition for African Low-Resource Languages: Challenges and Future Directions [4.524096445909663]
Low-resource languages in Africa remain significantly underrepresented in both research and practical applications.<n>This study investigates the major challenges hindering the development of ASR systems for these languages.
arXiv Detail & Related papers (2025-05-16T20:57:39Z)
Uncovering inequalities in new knowledge learning by large language models across different languages [66.687369838071]
We show that low-resource languages consistently face disadvantages across all four dimensions. We aim to raise awareness of linguistic inequalities in LLMs' new knowledge learning, fostering the development of more inclusive and equitable future LLMs.
arXiv Detail & Related papers (2025-03-06T03:41:47Z)
Preserving Cultural Identity with Context-Aware Translation Through Multi-Agent AI Systems [0.4218593777811082]
Language is a cornerstone of cultural identity, yet globalization and the dominance of major languages have placed nearly 3,000 languages at risk of extinction. Existing AI-driven translation models prioritize efficiency but often fail to capture cultural nuances, idiomatic expressions, and historical significance. We propose a multi-agent AI framework designed for culturally adaptive translation in underserved language communities.
arXiv Detail & Related papers (2025-03-05T06:43:59Z)
An Overview of Large Language Models for Statisticians [109.38601458831545]
Large Language Models (LLMs) have emerged as transformative tools in artificial intelligence (AI) This paper explores potential areas where statisticians can make important contributions to the development of LLMs. We focus on issues such as uncertainty quantification, interpretability, fairness, privacy, watermarking and model adaptation.
arXiv Detail & Related papers (2025-02-25T03:40:36Z)
Generative AI and Large Language Models in Language Preservation: Opportunities and Challenges [0.0]
Generative AI and large-scale language models (LLM) have emerged as powerful tools in language preservation. This paper examines the role of generative AIs and LLMs in preserving endangered languages, highlighting the risks and challenges associated with their use.
arXiv Detail & Related papers (2025-01-20T14:03:40Z)
LIMBA: An Open-Source Framework for the Preservation and Valorization of Low-Resource Languages using Generative Models [62.47865866398233]
This white paper proposes a framework to generate linguistic tools for low-resource languages. By addressing the data scarcity that hinders intelligent applications for such languages, we contribute to promoting linguistic diversity.
arXiv Detail & Related papers (2024-11-20T16:59:41Z)
Toward Cultural Interpretability: A Linguistic Anthropological Framework for Describing and Evaluating Large Language Models (LLMs) [13.71024600466761]
This article proposes a new integration of linguistic anthropology and machine learning (ML) We show the theoretical feasibility of a new, conjoint field of inquiry, cultural interpretability (CI) CI emphasizes how the dynamic relationship between language and culture makes contextually sensitive, open-ended conversation possible.
arXiv Detail & Related papers (2024-11-07T22:01:50Z)
Monolingual and Multilingual Misinformation Detection for Low-Resource Languages: A Comprehensive Survey [2.5459710368096586]
This survey provides a comprehensive overview of the current research on low-resource language misinformation detection. We review the existing datasets, methodologies, and tools used in these domains, identifying key challenges related to: data resources, model development, cultural and linguistic context, real-world applications, and research efforts. Our findings underscore the need for robust, inclusive systems capable of addressing misinformation across diverse linguistic and cultural contexts.
arXiv Detail & Related papers (2024-10-24T03:02:03Z)
Recent Advancements and Challenges of Turkic Central Asian Language Processing [4.189204855014775]
Research in NLP for Central Asian Turkic languages faces typical low-resource language challenges. Recent advancements have included the collection of language-specific datasets and the development of models for downstream tasks.
arXiv Detail & Related papers (2024-07-06T08:58:26Z)
Extrinsic Evaluation of Cultural Competence in Large Language Models [53.626808086522985]
We focus on extrinsic evaluation of cultural competence in two text generation tasks. We evaluate model outputs when an explicit cue of culture, specifically nationality, is perturbed in the prompts. We find weak correlations between text similarity of outputs for different countries and the cultural values of these countries.
arXiv Detail & Related papers (2024-06-17T14:03:27Z)
A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers [48.314619377988436]
The rapid development of Large Language Models (LLMs) demonstrates remarkable multilingual capabilities in natural language processing. Despite the breakthroughs of LLMs, the investigation into the multilingual scenario remains insufficient. This survey aims to help the research community address multilingual problems and provide a comprehensive understanding of the core concepts, key techniques, and latest developments in multilingual natural language processing based on LLMs.
arXiv Detail & Related papers (2024-05-17T17:47:39Z)
Saving the legacy of Hero Ibash: Evaluating Four Language Models for Aminoacian [0.8158530638728501]
This study assesses four cutting-edge language models in the underexplored Aminoacian language. It scrutinizes their adaptability, effectiveness, and limitations in text generation, semantic coherence, and contextual understanding.
arXiv Detail & Related papers (2024-02-28T07:22:13Z)
Massively Multi-Cultural Knowledge Acquisition & LM Benchmarking [48.21982147529661]
This paper introduces a novel approach for massively multicultural knowledge acquisition. Our method strategically navigates from densely informative Wikipedia documents on cultural topics to an extensive network of linked pages. Our work marks an important step towards deeper understanding and bridging the gaps of cultural disparities in AI.
arXiv Detail & Related papers (2024-02-14T18:16:54Z)
History, Development, and Principles of Large Language Models-An Introductory Survey [15.875687167037206]
Language models serve as a cornerstone in natural language processing (NLP) Over extensive research spanning decades, language modeling has progressed from initial statistical language models (SLMs) to the contemporary landscape of large language models (LLMs)
arXiv Detail & Related papers (2024-02-10T01:18:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.