Evolution of Part-of-Speech in Classical Chinese
- URL: http://arxiv.org/abs/2009.11144v1
- Date: Wed, 23 Sep 2020 13:41:27 GMT
- Title: Evolution of Part-of-Speech in Classical Chinese
- Authors: Bai Li
- Abstract summary: Bisang (2008) claimed that Classical Chinese is a precategorical language, where the syntactic position of a word determines its part-of-speech category.
We apply entropy-based metrics to evaluate these claims on historical corpora.
- Score: 2.870517198186329
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Classical Chinese is a language notable for its word class flexibility: the
same word may often be used as a noun or a verb. Bisang (2008) claimed that
Classical Chinese is a precategorical language, where the syntactic position of
a word determines its part-of-speech category. In this paper, we apply
entropy-based metrics to evaluate these claims on historical corpora. We
further explore differences between nouns and verbs in Classical Chinese: using
psycholinguistic norms, we find a positive correlation between concreteness and
noun usage. Finally, we align character embeddings from Classical and Modern
Chinese, and find that verbs undergo more semantic change than nouns.
Related papers
- Unsupervised Classification of English Words Based on Phonological Information: Discovery of Germanic and Latinate Clusters [9.220284665192663]
Cross-linguistically, native words and loanwords follow different phonological rules.
The Germanic-Latinate distinction in the English lexicon is learnable from the phonotactic information of individual words.
arXiv Detail & Related papers (2025-04-16T05:20:08Z) - What an Elegant Bridge: Multilingual LLMs are Biased Similarly in Different Languages [51.0349882045866]
This paper investigates biases of Large Language Models (LLMs) through the lens of grammatical gender.
We prompt a model to describe nouns with adjectives in various languages, focusing specifically on languages with grammatical gender.
We find that a simple classifier can not only predict noun gender above chance but also exhibit cross-language transferability.
arXiv Detail & Related papers (2024-07-12T22:10:16Z) - Computational Modelling of Plurality and Definiteness in Chinese Noun
Phrases [13.317456093426808]
We focus on the omission of the plurality and definiteness markers in Chinese noun phrases (NPs)
We build a corpus of Chinese NPs, each of which is accompanied by its corresponding context, and by labels indicating its singularity/plurality and definiteness/indefiniteness.
We train a bank of computational models using both classic machine learning models and state-of-the-art pre-trained language models to predict the plurality and definiteness of each NP.
arXiv Detail & Related papers (2024-03-07T10:06:54Z) - Shuo Wen Jie Zi: Rethinking Dictionaries and Glyphs for Chinese Language
Pre-training [50.100992353488174]
We introduce CDBERT, a new learning paradigm that enhances the semantics understanding ability of the Chinese PLMs with dictionary knowledge and structure of Chinese characters.
We name the two core modules of CDBERT as Shuowen and Jiezi, where Shuowen refers to the process of retrieving the most appropriate meaning from Chinese dictionaries.
Our paradigm demonstrates consistent improvements on previous Chinese PLMs across all tasks.
arXiv Detail & Related papers (2023-05-30T05:48:36Z) - Noun2Verb: Probabilistic frame semantics for word class conversion [8.939269057094661]
We present a formal framework that simulates the production and comprehension of novel denominal verb usages.
We show that a model where the speaker and listener cooperatively learn the joint distribution over semantic frame elements better explains the empirical denominal verb usages.
arXiv Detail & Related papers (2022-05-12T19:16:12Z) - When is Wall a Pared and when a Muro? -- Extracting Rules Governing
Lexical Selection [85.0262994506624]
We present a method for automatically identifying fine-grained lexical distinctions.
We extract concise descriptions explaining these distinctions in a human- and machine-readable format.
We use these descriptions to teach non-native speakers when to translate a given ambiguous word into its different possible translations.
arXiv Detail & Related papers (2021-09-13T14:49:00Z) - Lexical semantic change for Ancient Greek and Latin [61.69697586178796]
Associating a word's correct meaning in its historical context is a central challenge in diachronic research.
We build on a recent computational approach to semantic change based on a dynamic Bayesian mixture model.
We provide a systematic comparison of dynamic Bayesian mixture models for semantic change with state-of-the-art embedding-based models.
arXiv Detail & Related papers (2021-01-22T12:04:08Z) - Word class flexibility: A deep contextualized approach [18.50173460090958]
We propose a principled methodology to explore regularity in word class flexibility.
We find that contextualized embeddings capture human judgment of class variation within words in English.
We find greater semantic variation when flexible lemmas are used in their dominant word class.
arXiv Detail & Related papers (2020-09-19T14:41:50Z) - Corpus of Chinese Dynastic Histories: Gender Analysis over Two Millennia [3.2851864672627618]
dynastic histories form a large continuous linguistic space of approximately 2000 years, from the 3rd century BCE to the 18th century CE.
The histories are documented in Classical (Literary) Chinese in a corpus of over 20 million characters, suitable for the computational analysis of historical lexicon and semantic change.
This project introduces a new open-source corpus of twenty-four dynastic histories covered by Creative Commons license.
arXiv Detail & Related papers (2020-05-18T15:14:33Z) - Predicting Declension Class from Form and Meaning [70.65971611552871]
Class membership is far from deterministic, but the phonological form of a noun and/or its meaning can often provide imperfect clues.
We operationalize this by measuring how much information, in bits, we can glean about declension class from knowing the form and/or meaning of nouns.
We find for two Indo-European languages (Czech and German) that form and meaning respectively share significant amounts of information with class.
arXiv Detail & Related papers (2020-05-01T21:48:48Z) - A Corpus of Adpositional Supersenses for Mandarin Chinese [15.757892250956715]
This paper presents a corpus in which all adpositions have been semantically annotated in Mandarin Chinese.
Our approach adapts a framework that defined a general set of supersenses according to ostensibly language-independent semantic criteria.
We find that the supersense categories are well-suited to Chinese adpositions despite syntactic differences from English.
arXiv Detail & Related papers (2020-03-18T18:59:55Z) - Where New Words Are Born: Distributional Semantic Analysis of Neologisms
and Their Semantic Neighborhoods [51.34667808471513]
We investigate the importance of two factors, semantic sparsity and frequency growth rates of semantic neighbors, formalized in the distributional semantics paradigm.
We show that both factors are predictive word emergence although we find more support for the latter hypothesis.
arXiv Detail & Related papers (2020-01-21T19:09:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.