Deciphering Oracle Bone Language with Diffusion Models
- URL: http://arxiv.org/abs/2406.00684v1
- Date: Sun, 2 Jun 2024 09:42:23 GMT
- Title: Deciphering Oracle Bone Language with Diffusion Models
- Authors: Haisu Guan, Huanxin Yang, Xinyu Wang, Shengwei Han, Yongge Liu, Lianwen Jin, Xiang Bai, Yuliang Liu,
- Abstract summary: Oracle Bone Script (OBS) originated from China's Shang Dynasty approximately 3,000 years ago.
This paper introduces a novel approach by adopting image generation techniques, specifically through the development of Oracle Bone Script Decipher (OBSD)
OBSD generates vital clues for decipherment, charting a new course for AI-assisted analysis of ancient languages.
- Score: 70.69739681961558
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Originating from China's Shang Dynasty approximately 3,000 years ago, the Oracle Bone Script (OBS) is a cornerstone in the annals of linguistic history, predating many established writing systems. Despite the discovery of thousands of inscriptions, a vast expanse of OBS remains undeciphered, casting a veil of mystery over this ancient language. The emergence of modern AI technologies presents a novel frontier for OBS decipherment, challenging traditional NLP methods that rely heavily on large textual corpora, a luxury not afforded by historical languages. This paper introduces a novel approach by adopting image generation techniques, specifically through the development of Oracle Bone Script Decipher (OBSD). Utilizing a conditional diffusion-based strategy, OBSD generates vital clues for decipherment, charting a new course for AI-assisted analysis of ancient languages. To validate its efficacy, extensive experiments were conducted on an oracle bone script dataset, with quantitative results demonstrating the effectiveness of OBSD. Code and decipherment results will be made available at https://github.com/guanhaisu/OBSD.
Related papers
- A Cross-Font Image Retrieval Network for Recognizing Undeciphered Oracle Bone Inscriptions [12.664292922995532]
Oracle Bone Inscription (OBI) is the earliest mature writing system known in China to date.
We propose a cross-font image retrieval network (CFIRN) to decipher OBI characters.
arXiv Detail & Related papers (2024-09-10T10:04:58Z) - Oracle Bone Inscriptions Multi-modal Dataset [58.20314888996118]
Oracle bone inscriptions(OBI) is the earliest developed writing system in China, bearing invaluable written exemplifications of early Shang history and paleography.
This paper proposes an Oracle Bone Inscriptions Multi-modal dataset, which includes annotation information for 10,077 pieces of oracle bones.
This dataset can be used for a variety of AI-related research tasks relevant to the field of OBI, such as OBI Character Detection and Recognition, Rubbing Denoising, Character Matching, Character Generation, Reading Sequence Prediction, Missing Characters Completion task and so on.
arXiv Detail & Related papers (2024-07-04T12:47:32Z) - Puzzle Pieces Picker: Deciphering Ancient Chinese Characters with Radical Reconstruction [73.26364649572237]
Oracle Bone Inscriptions is one of the oldest existing forms of writing in the world.
A large number of Oracle Bone Inscriptions (OBI) remain undeciphered, making it one of the global challenges in paleography today.
This paper introduces a novel approach, namely Puzzle Pieces Picker (P$3$), to decipher these enigmatic characters through radical reconstruction.
arXiv Detail & Related papers (2024-06-05T07:34:39Z) - An open dataset for oracle bone script recognition and decipherment [66.35957530824872]
Oracle bone script, one of the earliest known forms of ancient Chinese writing, presents invaluable research materials for scholars studying the humanities and geography of the Shang Dynasty, dating back 3,000 years.
The passage of time has obscured much of their meaning, presenting a significant challenge in deciphering these ancient texts.
With the advent of Artificial Intelligence (AI), employing AI to assist in deciphering Oracle Bone Characters (OBCs) has become a feasible option.
This dataset encompasses 77,064 images of 1,588 individual deciphered characters and 62,989 images of 9,411 undeciphered characters, with a total of 140,
arXiv Detail & Related papers (2024-01-27T09:54:16Z) - An open dataset for the evolution of oracle bone characters: EVOBC [72.91231825135665]
The earliest extant Chinese characters originate from oracle bone inscriptions, which are closely related to other East Asian languages.
In this study, we systematically collected ancient characters from authoritative texts and websites spanning six historical stages.
We constructed an extensive dataset, consisting of 229,170 images representing 13,714 distinct character categories.
arXiv Detail & Related papers (2024-01-23T03:30:47Z) - Reverse-Engineering Decoding Strategies Given Blackbox Access to a
Language Generation System [73.52878118434147]
We present methods to reverse-engineer the decoding method used to generate text.
Our ability to discover which decoding strategy was used has implications for detecting generated text.
arXiv Detail & Related papers (2023-09-09T18:19:47Z) - CBAG: Conditional Biomedical Abstract Generation [1.2633386045916442]
We propose a transformer-based conditional language model with a shallow encoder "condition" stack, and a deep "language model" stack of multi-headed attention blocks.
We generate biomedical abstracts given only a proposed title, an intended publication year, and a set of keywords.
arXiv Detail & Related papers (2020-02-13T17:11:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.