Cross-strait Variations on Two Near-synonymous Loanwords xie2shang1 and
tan2pan4: A Corpus-based Comparative Study
- URL: http://arxiv.org/abs/2210.04161v1
- Date: Sun, 9 Oct 2022 04:10:58 GMT
- Title: Cross-strait Variations on Two Near-synonymous Loanwords xie2shang1 and
tan2pan4: A Corpus-based Comparative Study
- Authors: Yueyue Huang, Chu-Ren Huang
- Abstract summary: This study attempts to investigate cross-strait variations on two typical synonymous loanwords in Chinese, i.e. xie2shang1 and tan2pan4.
Through a comparative analysis, the study found some distributional, eventual, and contextual similarities and differences across Taiwan and Mainland Mandarin.
- Score: 2.6194322370744305
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This study attempts to investigate cross-strait variations on two typical
synonymous loanwords in Chinese, i.e. xie2shang1 and tan2pan4, drawn on MARVS
theory. Through a comparative analysis, the study found some distributional,
eventual, and contextual similarities and differences across Taiwan and
Mainland Mandarin. Compared with the underused tan2pan4, xie2shang1 is
significantly overused in Taiwan Mandarin and vice versa in Mainland Mandarin.
Additionally, though both words can refer to an inchoative process in Mainland
and Taiwan Mandarin, the starting point for xie2shang1 in Mainland Mandarin is
somewhat blurring compared with the usage in Taiwan Mandarin. Further on, in
Taiwan Mandarin, tan2pan4 can be used in economic and diplomatic contexts,
while xie2shang1 is used almost exclusively in political contexts. In Mainland
Mandarin, however, the two words can be used in a hybrid manner within
political contexts; moreover, tan2pan4 is prominently used in diplomatic
contexts with less reference to economic activities, while xie2sahng1 can be
found in both political and legal contexts, emphasizing a role of mediation.
Related papers
- A Topic-aware Comparable Corpus of Chinese Variations [0.6906005491572401]
Using Dcard for Taiwanese Mandarin and Sina Weibo for Mainland Chinese, we create a comparable corpus that updates regularly and reflects modern language use on social media.
arXiv Detail & Related papers (2024-11-17T04:06:12Z) - A corpus-based investigation of pitch contours of monosyllabic words in conversational Taiwan Mandarin [3.072340427031969]
We analyze the F0 contours of 3824 tokens of 63 different word types in a spontaneous Taiwan Mandarin corpus.
We show that the tonal context substantially modify a word's canonical tone.
We also show that word, and even more so, word sense, co-determine words' F0 contours.
arXiv Detail & Related papers (2024-09-12T09:51:56Z) - Form and meaning co-determine the realization of tone in Taiwan Mandarin spontaneous speech: the case of Tone 3 sandhi [1.7723990552388866]
In Standard Chinese, Tone 3 (the dipping tone) becomes Tone 2 (rising tone) when followed by another Tone 3.
Previous studies have noted that this sandhi process may be incomplete, in the sense that the assimilated Tone 3 is still distinct from a true Tone 2.
The present study investigates the pitch contours of two-character words with T2-T3 and T3-T3 tone patterns in spontaneous Taiwan Mandarin conversations.
arXiv Detail & Related papers (2024-08-28T12:25:45Z) - Enhancing Taiwanese Hokkien Dual Translation by Exploring and Standardizing of Four Writing Systems [4.150560582918129]
We employ a pre-trained LLaMA 2-7B model specialized in Traditional Mandarin Chinese to leverage the orthographic similarities between Taiwanese Hokkien Han and Traditional Mandarin Chinese.
We find that the use of a limited monolingual corpus still further improves the model's Taiwanese Hokkien capabilities.
arXiv Detail & Related papers (2024-03-18T17:56:13Z) - Enhancing Cross-lingual Transfer via Phonemic Transcription Integration [57.109031654219294]
PhoneXL is a framework incorporating phonemic transcriptions as an additional linguistic modality for cross-lingual transfer.
Our pilot study reveals phonemic transcription provides essential information beyond the orthography to enhance cross-lingual transfer.
arXiv Detail & Related papers (2023-07-10T06:17:33Z) - Cross-Lingual Speaker Identification Using Distant Supervision [84.51121411280134]
We propose a speaker identification framework that addresses issues such as lack of contextual reasoning and poor cross-lingual generalization.
We show that the resulting model outperforms previous state-of-the-art methods on two English speaker identification benchmarks by up to 9% in accuracy and 5% with only distant supervision.
arXiv Detail & Related papers (2022-10-11T20:49:44Z) - When is Wall a Pared and when a Muro? -- Extracting Rules Governing
Lexical Selection [85.0262994506624]
We present a method for automatically identifying fine-grained lexical distinctions.
We extract concise descriptions explaining these distinctions in a human- and machine-readable format.
We use these descriptions to teach non-native speakers when to translate a given ambiguous word into its different possible translations.
arXiv Detail & Related papers (2021-09-13T14:49:00Z) - Speakers Fill Lexical Semantic Gaps with Context [65.08205006886591]
We operationalise the lexical ambiguity of a word as the entropy of meanings it can take.
We find significant correlations between our estimate of ambiguity and the number of synonyms a word has in WordNet.
This suggests that, in the presence of ambiguity, speakers compensate by making contexts more informative.
arXiv Detail & Related papers (2020-10-05T17:19:10Z) - Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer [101.58431011820755]
We study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications.
We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations.
arXiv Detail & Related papers (2020-05-02T04:34:37Z) - On the Language Neutrality of Pre-trained Multilingual Representations [70.93503607755055]
We investigate the language-neutrality of multilingual contextual embeddings directly and with respect to lexical semantics.
Our results show that contextual embeddings are more language-neutral and, in general, more informative than aligned static word-type embeddings.
We show how to reach state-of-the-art accuracy on language identification and match the performance of statistical methods for word alignment of parallel sentences.
arXiv Detail & Related papers (2020-04-09T19:50:32Z) - A Corpus of Adpositional Supersenses for Mandarin Chinese [15.757892250956715]
This paper presents a corpus in which all adpositions have been semantically annotated in Mandarin Chinese.
Our approach adapts a framework that defined a general set of supersenses according to ostensibly language-independent semantic criteria.
We find that the supersense categories are well-suited to Chinese adpositions despite syntactic differences from English.
arXiv Detail & Related papers (2020-03-18T18:59:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.