Enabling Stroke-Level Structural Analysis of Hieroglyphic Scripts without Language-Specific Priors
- URL: http://arxiv.org/abs/2601.05508v1
- Date: Fri, 09 Jan 2026 03:30:12 GMT
- Title: Enabling Stroke-Level Structural Analysis of Hieroglyphic Scripts without Language-Specific Priors
- Authors: Fuwen Luo, Zihao Wan, Ziyue Wang, Yaluo Liu, Pau Tong Lin Xu, Xuanjia Qiao, Xiaolong Wang, Peng Li, Yang Liu,
- Abstract summary: Hieroglyphic Stroke Analyzer (HieroSA) is a framework that transforms logographic and ancient hieroglyphs character images into explicit, interpretable line-segment representations.<n>We show that HieroSA effectively captures character-internal structures and semantics, bypassing the need for language-specific priors.
- Score: 13.56721856255538
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Hieroglyphs, as logographic writing systems, encode rich semantic and cultural information within their internal structural composition. Yet, current advanced Large Language Models (LLMs) and Multimodal LLMs (MLLMs) usually remain structurally blind to this information. LLMs process characters as textual tokens, while MLLMs additionally view them as raw pixel grids. Both fall short to model the underlying logic of character strokes. Furthermore, existing structural analysis methods are often script-specific and labor-intensive. In this paper, we propose Hieroglyphic Stroke Analyzer (HieroSA), a novel and generalizable framework that enables MLLMs to automatically derive stroke-level structures from character bitmaps without handcrafted data. It transforms modern logographic and ancient hieroglyphs character images into explicit, interpretable line-segment representations in a normalized coordinate space, allowing for cross-lingual generalization. Extensive experiments demonstrate that HieroSA effectively captures character-internal structures and semantics, bypassing the need for language-specific priors. Experimental results highlight the potential of our work as a graphematics analysis tool for a deeper understanding of hieroglyphic scripts. View our code at https://github.com/THUNLP-MT/HieroSA.
Related papers
- <SOG_k>: One LLM Token for Explicit Graph Structural Understanding [57.017902343605364]
We propose to incorporate one special token SOG_k> to fully represent the Structure Of Graph within a unified token space.<n>SOG_k> empowers LLMs to understand, generate, and reason in a concise and accurate manner.
arXiv Detail & Related papers (2026-02-02T07:55:09Z) - LatentLens: Revealing Highly Interpretable Visual Tokens in LLMs [40.11215282864732]
We introduce LatentLens, a novel approach for mapping latent representations to descriptions in natural language.<n>We evaluate this method on 10 different Vision-Language Model (VLM) models.<n>We show that the descriptions produced by LatentLens are semantically meaningful and provide more fine-grained interpretations for humans.
arXiv Detail & Related papers (2026-01-31T02:33:07Z) - Tracing Multilingual Representations in LLMs with Cross-Layer Transcoders [51.380449540006985]
Large Language Models (LLMs) can process many languages, yet how they internally represent this diversity remains unclear.<n>Do they form shared multilingual representations with language-specific decoding, and if so, why does performance still favor the dominant training language?<n>We analyze their internal mechanisms using cross-layer transcoders (CLT) and attribution graphs.
arXiv Detail & Related papers (2025-11-13T22:51:06Z) - Spelling-out is not Straightforward: LLMs' Capability of Tokenization from Token to Characters [25.430820735194768]
Large language models (LLMs) can spell out tokens character by character with high accuracy, yet they struggle with more complex character-level tasks.<n>We investigate how LLMs internally represent and utilize character-level information during the spelling-out process.
arXiv Detail & Related papers (2025-06-12T12:27:41Z) - Unify Graph Learning with Text: Unleashing LLM Potentials for Session Search [35.20525123189316]
Session search involves a series of interactive queries and actions to fulfill user's complex information need.<n>Current strategies typically prioritize sequential modeling for deep semantic understanding, overlooking the graph structure in interactions.<n>We propose Symbolic Graph Ranker (SGR), which aims to take advantage of both text-based and graph-based approaches.
arXiv Detail & Related papers (2025-05-20T10:05:06Z) - Filter-then-Generate: Large Language Models with Structure-Text Adapter for Knowledge Graph Completion [20.973071287301067]
Large Language Models (LLMs) present massive inherent knowledge and superior semantic comprehension capability.<n> Empirical evidence suggests that LLMs consistently perform worse than conventional knowledge graph completion approaches.<n>We propose a novel instruction-tuning-based method, namely FtG, to address these challenges.
arXiv Detail & Related papers (2024-12-12T09:22:04Z) - LogogramNLP: Comparing Visual and Textual Representations of Ancient Logographic Writing Systems for NLP [30.804518354947565]
A large portion of logographic data persists in a purely visual form due to the absence of transcription.
This issue poses a bottleneck for researchers seeking to apply NLP toolkits to study ancient logographic languages.
We introduce LogogramNLP, the first benchmark enabling NLP analysis of ancient logographic languages.
arXiv Detail & Related papers (2024-08-08T17:58:06Z) - Large Language Models on Graphs: A Comprehensive Survey [77.16803297418201]
We provide a systematic review of scenarios and techniques related to large language models on graphs.
We first summarize potential scenarios of adopting LLMs on graphs into three categories, namely pure graphs, text-attributed graphs, and text-paired graphs.
We discuss the real-world applications of such methods and summarize open-source codes and benchmark datasets.
arXiv Detail & Related papers (2023-12-05T14:14:27Z) - Can LLMs Effectively Leverage Graph Structural Information through Prompts, and Why? [18.328637750057037]
Large language models (LLMs) are gaining increasing attention for their capability to process graphs with rich text attributes.
We aim to understand why the incorporation of structural information inherent in graph data can improve the prediction performance of LLMs.
arXiv Detail & Related papers (2023-09-28T16:58:37Z) - CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model [55.321010757641524]
We introduce CLIP4STR, a simple yet effective STR method built upon image and text encoders of CLIP.<n>We scale CLIP4STR in terms of the model size, pre-training data, and training data, achieving state-of-the-art performance on 13 STR benchmarks.
arXiv Detail & Related papers (2023-05-23T12:51:20Z) - Integrating Visuospatial, Linguistic and Commonsense Structure into
Story Visualization [81.26077816854449]
We first explore the use of constituency parse trees for encoding structured input.
Second, we augment the structured input with commonsense information and study the impact of this external knowledge on the generation of visual story.
Third, we incorporate visual structure via bounding boxes and dense captioning to provide feedback about the characters/objects in generated images.
arXiv Detail & Related papers (2021-10-21T00:16:02Z) - Enabling Language Models to Fill in the Blanks [81.59381915581892]
We present a simple approach for text infilling, the task of predicting missing spans of text at any position in a document.
We train (or fine-tune) off-the-shelf language models on sequences containing the concatenation of artificially-masked text and the text which was masked.
We show that this approach, which we call infilling by language modeling, can enable LMs to infill entire sentences effectively on three different domains: short stories, scientific abstracts, and lyrics.
arXiv Detail & Related papers (2020-05-11T18:00:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.