Geometric Structures and Patterns of Meaning: A PHATE Manifold Analysis of Chinese Character Embeddings
- URL: http://arxiv.org/abs/2510.01230v1
- Date: Tue, 23 Sep 2025 14:28:34 GMT
- Title: Geometric Structures and Patterns of Meaning: A PHATE Manifold Analysis of Chinese Character Embeddings
- Authors: Wen G. Gong,
- Abstract summary: We investigate geometric patterns in Chinese character embeddings using PHATE manifold analysis.<n>We observe clustering patterns for content words and branching patterns for function words.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We systematically investigate geometric patterns in Chinese character embeddings using PHATE manifold analysis. Through cross-validation across seven embedding models and eight dimensionality reduction methods, we observe clustering patterns for content words and branching patterns for function words. Analysis of over 1000 Chinese characters across 12 semantic domains reveals that geometric complexity correlates with semantic content: meaningful characters exhibit rich geometric diversity while structural radicals collapse into tight clusters. The comprehensive child-network analysis (123 phrases) demonstrates systematic semantic expansion from elemental character. These findings provide computational evidence supporting traditional linguistic theory and establish a novel framework for geometric analysis of semantic organization.
Related papers
- Geometric Patterns of Meaning: A PHATE Manifold Analysis of Multi-lingual Embeddings [0.0]
We introduce a multi-level analysis framework for examining semantic geometry in multilingual embeddings, implemented through Semanscope.<n>Analysis of diverse datasets spanning sub-character components, alphabetic systems, semantic domains, and numerical concepts reveals systematic geometric patterns and critical limitations in current embedding models.<n>These findings establish PHATE manifold learning as an essential analytic tool not only for studying geometric structure of meaning in embedding space, but also for validating the effectiveness of embedding models in capturing semantic relationships.
arXiv Detail & Related papers (2025-12-29T14:00:12Z) - From Topology to Retrieval: Decoding Embedding Spaces with Unified Signatures [38.75080027435365]
We present a comprehensive analysis of topological and geometric measures across a wide set of text embedding models and datasets.<n>We introduce Unified Topological Signatures (UTS), a holistic framework for characterizing embedding spaces.
arXiv Detail & Related papers (2025-11-27T06:37:45Z) - GeoGNN: Quantifying and Mitigating Semantic Drift in Text-Attributed Graphs [59.61242815508687]
Graph neural networks (GNNs) on text--attributed graphs (TAGs) encode node texts using pretrained language models (PLMs) and propagate these embeddings through linear neighborhood aggregation.<n>This work introduces a local PCA-based metric that measures the degree of semantic drift and provides the first quantitative framework to analyze how different aggregation mechanisms affect manifold structure.
arXiv Detail & Related papers (2025-11-12T06:48:43Z) - Geometry of Semantics in Next-Token Prediction: How Optimization Implicitly Organizes Linguistic Representations [34.88156871518115]
Next-token prediction (NTP) optimization leads language models to extract and organize semantic structure from text.<n>We demonstrate that concepts corresponding to larger singular values are learned earlier during training, yielding a natural semantic hierarchy.<n>This insight motivates orthant-based clustering, a method that combines concept signs to identify interpretable semantic categories.
arXiv Detail & Related papers (2025-05-13T08:46:04Z) - Geometric Signatures of Compositionality Across a Language Model's Lifetime [47.25475802128033]
We study whether contemporary language models reflect intrinsic simplicity of language enabled by compositionality.<n>We find that the relationship between compositionality and geometric complexity arises due to learned linguistic features over training.<n>Our analyses reveal a striking contrast between nonlinear and linear dimensionality, showing they respectively encode semantic and superficial aspects of linguistic composition.
arXiv Detail & Related papers (2024-10-02T11:54:06Z) - A Joint Matrix Factorization Analysis of Multilingual Representations [28.751144371901958]
We present an analysis tool based on joint matrix factorization for comparing latent representations of multilingual and monolingual models.
We study to what extent and how morphosyntactic features are reflected in the representations learned by multilingual pre-trained models.
arXiv Detail & Related papers (2023-10-24T04:43:45Z) - Discovering Universal Geometry in Embeddings with ICA [3.1921092049934647]
We show that each embedding can be expressed as a composition of a few intrinsic interpretable axes.
The discovery of a universal semantic structure in the geometric patterns of embeddings enhances our understanding of the representations in embeddings.
arXiv Detail & Related papers (2023-05-22T16:04:44Z) - How Do Transformers Learn Topic Structure: Towards a Mechanistic
Understanding [56.222097640468306]
We provide mechanistic understanding of how transformers learn "semantic structure"
We show, through a combination of mathematical analysis and experiments on Wikipedia data, that the embedding layer and the self-attention layer encode the topical structure.
arXiv Detail & Related papers (2023-03-07T21:42:17Z) - Variational Cross-Graph Reasoning and Adaptive Structured Semantics
Learning for Compositional Temporal Grounding [143.5927158318524]
Temporal grounding is the task of locating a specific segment from an untrimmed video according to a query sentence.
We introduce a new Compositional Temporal Grounding task and construct two new dataset splits.
We argue that the inherent structured semantics inside the videos and language is the crucial factor to achieve compositional generalization.
arXiv Detail & Related papers (2023-01-22T08:02:23Z) - An Informational Space Based Semantic Analysis for Scientific Texts [62.997667081978825]
This paper introduces computational methods for semantic analysis and the quantifying the meaning of short scientific texts.
The representation of scientific-specific meaning is standardised by replacing the situation representations, rather than psychological properties.
The research in this paper conducts the base for the geometric representation of the meaning of texts.
arXiv Detail & Related papers (2022-05-31T11:19:32Z) - A singular Riemannian geometry approach to Deep Neural Networks I.
Theoretical foundations [77.86290991564829]
Deep Neural Networks are widely used for solving complex problems in several scientific areas, such as speech recognition, machine translation, image analysis.
We study a particular sequence of maps between manifold, with the last manifold of the sequence equipped with a Riemannian metric.
We investigate the theoretical properties of the maps of such sequence, eventually we focus on the case of maps between implementing neural networks of practical interest.
arXiv Detail & Related papers (2021-12-17T11:43:30Z) - A Frobenius Algebraic Analysis for Parasitic Gaps [4.254099382808598]
We identify two types of parasitic gapping where the duplication of semantic content can be confined to the lexicon.
For parasitic gaps affecting arguments of the same predicate, the polymorphism is associated with the lexical item that introduces the primary gap.
A compositional translation relates syntactic types and derivations to the interpreting compact closed category of finite dimensional vector spaces.
arXiv Detail & Related papers (2020-05-12T09:36:15Z) - Evaluating Transformer-Based Multilingual Text Classification [55.53547556060537]
We argue that NLP tools perform unequally across languages with different syntactic and morphological structures.
We calculate word order and morphological similarity indices to aid our empirical study.
arXiv Detail & Related papers (2020-04-29T03:34:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.