Computational Modelling of Plurality and Definiteness in Chinese Noun
Phrases
- URL: http://arxiv.org/abs/2403.04376v1
- Date: Thu, 7 Mar 2024 10:06:54 GMT
- Title: Computational Modelling of Plurality and Definiteness in Chinese Noun
Phrases
- Authors: Yuqi Liu, Guanyi Chen, Kees van Deemter
- Abstract summary: We focus on the omission of the plurality and definiteness markers in Chinese noun phrases (NPs)
We build a corpus of Chinese NPs, each of which is accompanied by its corresponding context, and by labels indicating its singularity/plurality and definiteness/indefiniteness.
We train a bank of computational models using both classic machine learning models and state-of-the-art pre-trained language models to predict the plurality and definiteness of each NP.
- Score: 13.317456093426808
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Theoretical linguists have suggested that some languages (e.g., Chinese and
Japanese) are "cooler" than other languages based on the observation that the
intended meaning of phrases in these languages depends more on their contexts.
As a result, many expressions in these languages are shortened, and their
meaning is inferred from the context. In this paper, we focus on the omission
of the plurality and definiteness markers in Chinese noun phrases (NPs) to
investigate the predictability of their intended meaning given the contexts. To
this end, we built a corpus of Chinese NPs, each of which is accompanied by its
corresponding context, and by labels indicating its singularity/plurality and
definiteness/indefiniteness. We carried out corpus assessments and analyses.
The results suggest that Chinese speakers indeed drop plurality and
definiteness markers very frequently. Building on the corpus, we train a bank
of computational models using both classic machine learning models and
state-of-the-art pre-trained language models to predict the plurality and
definiteness of each NP. We report on the performance of these models and
analyse their behaviours.
Related papers
- To Drop or Not to Drop? Predicting Argument Ellipsis Judgments: A Case Study in Japanese [26.659122101710068]
We study whether and why a particular argument should be omitted across over 2,000 data points in the balanced corpus of Japanese.
The data indicate that native speakers overall share common criteria for such judgments.
The gap between the systems' prediction and human judgments in specific linguistic aspects is revealed.
arXiv Detail & Related papers (2024-04-17T12:26:52Z) - Multilingual Conceptual Coverage in Text-to-Image Models [98.80343331645626]
"Conceptual Coverage Across Languages" (CoCo-CroLa) is a technique for benchmarking the degree to which any generative text-to-image system provides multilingual parity to its training language in terms of tangible nouns.
For each model we can assess "conceptual coverage" of a given target language relative to a source language by comparing the population of images generated for a series of tangible nouns in the source language to the population of images generated for each noun under translation in the target language.
arXiv Detail & Related papers (2023-06-02T17:59:09Z) - A Linguistic Investigation of Machine Learning based Contradiction
Detection Models: An Empirical Analysis and Future Perspectives [0.34998703934432673]
We analyze two Natural Language Inference data sets with respect to their linguistic features.
The goal is to identify those syntactic and semantic properties that are particularly hard to comprehend for a machine learning model.
arXiv Detail & Related papers (2022-10-19T10:06:03Z) - Understanding the Use of Quantifiers in Mandarin [7.249126423531564]
We introduce a corpus of short texts in Mandarin, in which quantified expressions figure prominently.
We examine the hypothesis that speakers of East Asian languages speak more briefly but less informatively than speakers of West-European languages.
arXiv Detail & Related papers (2022-09-24T10:43:07Z) - Analyzing the Limits of Self-Supervision in Handling Bias in Language [52.26068057260399]
We evaluate how well language models capture the semantics of four tasks for bias: diagnosis, identification, extraction and rephrasing.
Our analyses indicate that language models are capable of performing these tasks to widely varying degrees across different bias dimensions, such as gender and political affiliation.
arXiv Detail & Related papers (2021-12-16T05:36:08Z) - Controlled Evaluation of Grammatical Knowledge in Mandarin Chinese
Language Models [22.57309958548928]
We investigate whether structural supervision improves language models' ability to learn grammatical dependencies in typologically different languages.
We train LSTMs, Recurrent Neural Network Grammars, Transformer language models, and generative parsing models on datasets of different sizes.
We find suggestive evidence that structural supervision helps with representing syntactic state across intervening content and improves performance in low-data settings.
arXiv Detail & Related papers (2021-09-22T22:11:30Z) - Language Models are Few-shot Multilingual Learners [66.11011385895195]
We evaluate the multilingual skills of the GPT and T5 models in conducting multi-class classification on non-English languages.
We show that, given a few English examples as context, pre-trained language models can predict not only English test samples but also non-English ones.
arXiv Detail & Related papers (2021-09-16T03:08:22Z) - Discovering Representation Sprachbund For Multilingual Pre-Training [139.05668687865688]
We generate language representation from multilingual pre-trained models and conduct linguistic analysis.
We cluster all the target languages into multiple groups and name each group as a representation sprachbund.
Experiments are conducted on cross-lingual benchmarks and significant improvements are achieved compared to strong baselines.
arXiv Detail & Related papers (2021-09-01T09:32:06Z) - AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages
with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context.
It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts.
Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z) - LET: Linguistic Knowledge Enhanced Graph Transformer for Chinese Short
Text Matching [29.318730227080675]
We introduce HowNet as an external knowledge base and propose a Linguistic knowledge Enhanced graph Transformer (LET) to deal with word ambiguity.
Experimental results on two Chinese datasets show that our models outperform various typical text matching approaches.
arXiv Detail & Related papers (2021-02-25T04:01:51Z) - Investigating Cross-Linguistic Adjective Ordering Tendencies with a
Latent-Variable Model [66.84264870118723]
We present the first purely corpus-driven model of multi-lingual adjective ordering in the form of a latent-variable model.
We provide strong converging evidence for the existence of universal, cross-linguistic, hierarchical adjective ordering tendencies.
arXiv Detail & Related papers (2020-10-09T18:27:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.