ICH-Qwen: A Large Language Model Towards Chinese Intangible Cultural Heritage
- URL: http://arxiv.org/abs/2505.23831v1
- Date: Wed, 28 May 2025 09:02:13 GMT
- Title: ICH-Qwen: A Large Language Model Towards Chinese Intangible Cultural Heritage
- Authors: Wenhao Ye, Tiansheng Zheng, Yue Qi, Wenhua Zhao, Xiyu Wang, Xue Zhao, Jiacheng He, Yaya Zheng, Dongbo Wang,
- Abstract summary: The study uses a substantial corpus of open-source Chinese ICH data to develop a large language model, ICH-Qwen, for the ICH domain.<n>The model employs natural language understanding and knowledge reasoning capabilities of large language models, augmented with synthetic data and fine-tuning techniques.<n>It is anticipated that the model will provide intelligent solutions for the protection, inheritance and dissemination of intangible cultural heritage.
- Score: 8.58469813632992
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The intangible cultural heritage (ICH) of China, a cultural asset transmitted across generations by various ethnic groups, serves as a significant testament to the evolution of human civilization and holds irreplaceable value for the preservation of historical lineage and the enhancement of cultural self-confidence. However, the rapid pace of modernization poses formidable challenges to ICH, including threats damage, disappearance and discontinuity of inheritance. China has the highest number of items on the UNESCO Intangible Cultural Heritage List, which is indicative of the nation's abundant cultural resources and emphasises the pressing need for ICH preservation. In recent years, the rapid advancements in large language modelling have provided a novel technological approach for the preservation and dissemination of ICH. This study utilises a substantial corpus of open-source Chinese ICH data to develop a large language model, ICH-Qwen, for the ICH domain. The model employs natural language understanding and knowledge reasoning capabilities of large language models, augmented with synthetic data and fine-tuning techniques. The experimental results demonstrate the efficacy of ICH-Qwen in executing tasks specific to the ICH domain. It is anticipated that the model will provide intelligent solutions for the protection, inheritance and dissemination of intangible cultural heritage, as well as new theoretical and practical references for the sustainable development of intangible cultural heritage. Furthermore, it is expected that the study will open up new paths for digital humanities research.
Related papers
- CAReDiO: Cultural Alignment of LLM via Representativeness and Distinctiveness Guided Data Optimization [50.90288681622152]
Large Language Models (LLMs) more deeply integrate into human life across various regions.<n>Existing approaches develop culturally aligned LLMs through fine-tuning with culture-specific corpora.<n>We introduce CAReDiO, a novel cultural data construction framework.
arXiv Detail & Related papers (2025-04-09T13:40:13Z) - Time Travel: A Comprehensive Benchmark to Evaluate LMMs on Historical and Cultural Artifacts [65.90535970515266]
TimeTravel is a benchmark of 10,250 expert-verified samples spanning 266 distinct cultures across 10 major historical regions.<n>TimeTravel is designed for AI-driven analysis of manuscripts, artworks, inscriptions, and archaeological discoveries.<n>We evaluate contemporary AI models on TimeTravel, highlighting their strengths and identifying areas for improvement.
arXiv Detail & Related papers (2025-02-20T18:59:51Z) - Opportunities and Challenges of Large Language Models for Low-Resource Languages in Humanities Research [23.773194690783512]
Low-resource languages serve as invaluable repositories of human history, embodying cultural evolution and intellectual diversity.<n>Despite their significance, these languages face critical challenges, including data scarcity and technological limitations.<n>Recent advancements in large language models (LLMs) offer transformative opportunities for addressing these challenges.
arXiv Detail & Related papers (2024-11-30T00:10:56Z) - LIMBA: An Open-Source Framework for the Preservation and Valorization of Low-Resource Languages using Generative Models [62.47865866398233]
This white paper proposes a framework to generate linguistic tools for low-resource languages.
By addressing the data scarcity that hinders intelligent applications for such languages, we contribute to promoting linguistic diversity.
arXiv Detail & Related papers (2024-11-20T16:59:41Z) - Extrinsic Evaluation of Cultural Competence in Large Language Models [53.626808086522985]
We focus on extrinsic evaluation of cultural competence in two text generation tasks.
We evaluate model outputs when an explicit cue of culture, specifically nationality, is perturbed in the prompts.
We find weak correlations between text similarity of outputs for different countries and the cultural values of these countries.
arXiv Detail & Related papers (2024-06-17T14:03:27Z) - On the Preservation of Africa's Cultural Heritage in the Age of Artificial Intelligence [0.0]
The paper traces the stages of knowledge dissemination from oral traditions to the digital era, highlighting the significance of languages and cultural diversity in this progression.<n>It also explores the impact of digital technologies on memory, communication, and cultural preservation, emphasizing the need for promoting a culture of the digital (rather than a digital culture) in Africa and beyond.
arXiv Detail & Related papers (2024-03-11T16:18:40Z) - Massively Multi-Cultural Knowledge Acquisition & LM Benchmarking [48.21982147529661]
This paper introduces a novel approach for massively multicultural knowledge acquisition.
Our method strategically navigates from densely informative Wikipedia documents on cultural topics to an extensive network of linked pages.
Our work marks an important step towards deeper understanding and bridging the gaps of cultural disparities in AI.
arXiv Detail & Related papers (2024-02-14T18:16:54Z) - Cultural Compass: Predicting Transfer Learning Success in Offensive Language Detection with Cultural Features [19.72091739119933]
Our study delves into the intersection of cultural features and transfer learning effectiveness.
Based on these results, we advocate for the integration of cultural information into datasets.
Our research signifies a step forward in the quest for more inclusive, culturally sensitive language technologies.
arXiv Detail & Related papers (2023-10-10T09:29:38Z) - Learning Robust Real-Time Cultural Transmission without Human Data [82.05222093231566]
We provide a method for generating zero-shot, high recall cultural transmission in artificially intelligent agents.
Our agents succeed at real-time cultural transmission from humans in novel contexts without using any pre-collected human data.
This paves the way for cultural evolution as an algorithm for developing artificial general intelligence.
arXiv Detail & Related papers (2022-03-01T19:32:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.