Related papers: HieroLM: Egyptian Hieroglyph Recovery with Next Word Prediction Language Model

HieroLM: Egyptian Hieroglyph Recovery with Next Word Prediction Language Model

URL: http://arxiv.org/abs/2503.04996v1
Date: Thu, 06 Mar 2025 21:53:49 GMT
Title: HieroLM: Egyptian Hieroglyph Recovery with Next Word Prediction Language Model
Authors: Xuheng Cai, Erica Zhang,
Abstract summary: Egyptian hieroglyphs are found on numerous ancient Egyptian artifacts, but it is common that they are blurry or even missing due to erosion.<n>Existing efforts to restore blurry hieroglyphs adopt computer vision techniques such as CNNs and model hieroglyph recovery as an image classification task.<n>This paper proposes a novel approach to model hieroglyph recovery as a next word prediction task and use language models to address it.
Score: 1.1510009152620668
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Egyptian hieroglyphs are found on numerous ancient Egyptian artifacts, but it is common that they are blurry or even missing due to erosion. Existing efforts to restore blurry hieroglyphs adopt computer vision techniques such as CNNs and model hieroglyph recovery as an image classification task, which suffers from two major limitations: (i) They cannot handle severely damaged or completely missing hieroglyphs. (ii) They make predictions based on a single hieroglyph without considering contextual and grammatical information. This paper proposes a novel approach to model hieroglyph recovery as a next word prediction task and use language models to address it. We compare the performance of different SOTA language models and choose LSTM as the architecture of our HieroLM due to the strong local affinity of semantics in Egyptian hieroglyph texts. Experiments show that HieroLM achieves over 44% accuracy and maintains notable performance on multi-shot predictions and scarce data, which makes it a pragmatic tool to assist scholars in inferring missing hieroglyphs. It can also complement CV-based models to significantly reduce perplexity in recognizing blurry hieroglyphs. Our code is available at https://github.com/Rick-Cai/HieroLM/.

Related papers

Neural Style Transfer for Synthesising a Dataset of Ancient Egyptian Hieroglyphs [0.0]
This paper presents a novel method for generating datasets of ancient Egyptian hieroglyphs by applying NST to a digital typeface. Experimental results found that image classification models trained on NST-generated examples and photographs demonstrate equal performance and transferability to real unseen images of hieroglyphs.
arXiv Detail & Related papers (2025-04-02T22:30:45Z)
Web Artifact Attacks Disrupt Vision Language Models [61.59021920232986]
Vision-language models (VLMs) are trained on large-scale, lightly curated web datasets. They learn unintended correlations between semantic concepts and unrelated visual signals. Prior work has weaponized these correlations as an attack vector to manipulate model predictions. We introduce artifact-based attacks: a novel class of manipulations that mislead models using both non-matching text and graphical elements.
arXiv Detail & Related papers (2025-03-17T18:59:29Z)
What Makes for Good Image Captions? [50.48589893443939]
Our framework posits that good image captions should balance three key aspects: informationally sufficient, minimally redundant, and readily comprehensible by humans. We introduce the Pyramid of Captions (PoCa) method, which generates enriched captions by integrating local and global visual information.
arXiv Detail & Related papers (2024-05-01T12:49:57Z)
Fantastic Semantics and Where to Find Them: Investigating Which Layers of Generative LLMs Reflect Lexical Semantics [50.982315553104975]
We investigate the bottom-up evolution of lexical semantics for a popular large language model, namely Llama2. Our experiments show that the representations in lower layers encode lexical semantics, while the higher layers, with weaker semantic induction, are responsible for prediction. This is in contrast to models with discriminative objectives, such as mask language modeling, where the higher layers obtain better lexical semantics.
arXiv Detail & Related papers (2024-03-03T13:14:47Z)
Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning [90.13978453378768]
We introduce a comprehensive typology of factual errors in generated chart captions. A large-scale human annotation effort provides insight into the error patterns and frequencies in captions crafted by various chart captioning models. Our analysis reveals that even state-of-the-art models, including GPT-4V, frequently produce captions laced with factual inaccuracies.
arXiv Detail & Related papers (2023-12-15T19:16:21Z)
Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust. Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model. We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z)
TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at Scale [59.01246141215051]
We analyze the factor that leads to degradation from the perspective of language supervision. We propose a tunable-free pre-training strategy to retain the generalization ability of the text encoder. We produce a series of models, dubbed TVTSv2, with up to one billion parameters.
arXiv Detail & Related papers (2023-05-23T15:44:56Z)
Parents and Children: Distinguishing Multimodal DeepFakes from Natural Images [60.34381768479834]
Recent advancements in diffusion models have enabled the generation of realistic deepfakes from textual prompts in natural language. We pioneer a systematic study on deepfake detection generated by state-of-the-art diffusion models.
arXiv Detail & Related papers (2023-04-02T10:25:09Z)
AGTGAN: Unpaired Image Translation for Photographic Ancient Character Generation [27.77329906930072]
We propose an unsupervised generative adversarial network called AGTGAN. By explicit global and local glyph shape style modeling, our method can generate characters with diverse glyphs and realistic textures. With our generated images, experiments on the largest photographic oracle bone character dataset show that our method can achieve a significant increase in classification accuracy, up to 16.34%.
arXiv Detail & Related papers (2023-03-13T11:18:41Z)
Artifact Reduction in Fundus Imaging using Cycle Consistent Adversarial Neural Networks [0.0]
Deep learning is a powerful tool to extract patterns from data without much human intervention. An attempt has been made to automatically rectify such artifacts present in the images of the fundus. We use a CycleGAN based model which consists of residual blocks to reduce the artifacts in the images.
arXiv Detail & Related papers (2021-12-25T18:05:48Z)
Filling the Gaps in Ancient Akkadian Texts: A Masked Language Modelling Approach [8.00388161728995]
We present models which complete missing text given transliterations of ancient Mesopotamian documents. Due to the tablets' deterioration, scholars often rely on contextual cues to manually fill in missing parts in the text.
arXiv Detail & Related papers (2021-09-09T18:58:14Z)
Font Completion and Manipulation by Cycling Between Multi-Modality Representations [113.26243126754704]
We innovate to explore the generation of font glyphs as 2D graphic objects with the graph as an intermediate representation. We formulate a cross-modality cycled image-to-image structure with a graph between an image encoder and an image. Our model generates improved results than both image-to-image baseline and previous state-of-the-art methods for glyph completion.
arXiv Detail & Related papers (2021-08-30T02:43:29Z)
Few-Shot Font Generation with Deep Metric Learning [33.12829580813688]
The proposed framework introduces deep metric learning to style encoders. We performed experiments using black-and-white and shape-distinctive font datasets.
arXiv Detail & Related papers (2020-11-04T10:12:10Z)
PhishGAN: Data Augmentation and Identification of Homoglpyh Attacks [0.0]
Homoglyph attacks are a common technique used by hackers to conduct phishing. Domain names or links that are visually similar to actual ones are created via punycode to obfuscate the attack. Here, we show how a conditional Generative Adversarial Network (GAN), PhishGAN, can be used to generate images of hieroglyphs.
arXiv Detail & Related papers (2020-06-24T13:59:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.