A Corpus of Adpositional Supersenses for Mandarin Chinese
- URL: http://arxiv.org/abs/2003.08437v1
- Date: Wed, 18 Mar 2020 18:59:55 GMT
- Title: A Corpus of Adpositional Supersenses for Mandarin Chinese
- Authors: Siyao Peng, Yang Liu, Yilun Zhu, Austin Blodgett, Yushi Zhao, Nathan
Schneider
- Abstract summary: This paper presents a corpus in which all adpositions have been semantically annotated in Mandarin Chinese.
Our approach adapts a framework that defined a general set of supersenses according to ostensibly language-independent semantic criteria.
We find that the supersense categories are well-suited to Chinese adpositions despite syntactic differences from English.
- Score: 15.757892250956715
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adpositions are frequent markers of semantic relations, but they are highly
ambiguous and vary significantly from language to language. Moreover, there is
a dearth of annotated corpora for investigating the cross-linguistic variation
of adposition semantics, or for building multilingual disambiguation systems.
This paper presents a corpus in which all adpositions have been semantically
annotated in Mandarin Chinese; to the best of our knowledge, this is the first
Chinese corpus to be broadly annotated with adposition semantics. Our approach
adapts a framework that defined a general set of supersenses according to
ostensibly language-independent semantic criteria, though its development
focused primarily on English prepositions (Schneider et al., 2018). We find
that the supersense categories are well-suited to Chinese adpositions despite
syntactic differences from English. On a Mandarin translation of The Little
Prince, we achieve high inter-annotator agreement and analyze semantic
correspondences of adposition tokens in bitext.
Related papers
- A General and Flexible Multi-concept Parsing Framework for Multilingual Semantic Matching [60.51839859852572]
We propose to resolve the text into multi concepts for multilingual semantic matching to liberate the model from the reliance on NER models.
We conduct comprehensive experiments on English datasets QQP and MRPC, and Chinese dataset Medical-SM.
arXiv Detail & Related papers (2024-03-05T13:55:16Z) - Proposition from the Perspective of Chinese Language: A Chinese
Proposition Classification Evaluation Benchmark [21.91454409571424]
We propose a comprehensive multi-level proposition classification system based on linguistics and logic.
We create a large-scale Chinese proposition dataset PEACE from multiple domains.
Results show the importance of properly modeling the semantic features of propositions.
arXiv Detail & Related papers (2023-09-18T09:18:39Z) - Is Argument Structure of Learner Chinese Understandable: A Corpus-Based
Analysis [8.883799596036484]
This paper presents a corpus-based analysis of argument structure errors in learner Chinese.
The data for analysis includes sentences produced by language learners as well as their corrections by native speakers.
We couple the data with semantic role labeling annotations that are manually created by two senior students.
arXiv Detail & Related papers (2023-08-17T21:10:04Z) - Discourse Representation Structure Parsing for Chinese [8.846860617823005]
We explore the feasibility of Chinese semantic parsing in the absence of labeled data for Chinese meaning representations.
We propose a test suite designed explicitly for Chinese semantic parsing, which provides fine-grained evaluation for parsing performance.
Our experimental results show that the difficulty of Chinese semantic parsing is mainly caused by adverbs.
arXiv Detail & Related papers (2023-06-16T09:47:45Z) - Shuo Wen Jie Zi: Rethinking Dictionaries and Glyphs for Chinese Language
Pre-training [50.100992353488174]
We introduce CDBERT, a new learning paradigm that enhances the semantics understanding ability of the Chinese PLMs with dictionary knowledge and structure of Chinese characters.
We name the two core modules of CDBERT as Shuowen and Jiezi, where Shuowen refers to the process of retrieving the most appropriate meaning from Chinese dictionaries.
Our paradigm demonstrates consistent improvements on previous Chinese PLMs across all tasks.
arXiv Detail & Related papers (2023-05-30T05:48:36Z) - A Corpus for Sentence-level Subjectivity Detection on English News Articles [49.49218203204942]
We use our guidelines to collect NewsSD-ENG, a corpus of 638 objective and 411 subjective sentences extracted from English news articles on controversial topics.
Our corpus paves the way for subjectivity detection in English without relying on language-specific tools, such as lexicons or machine translation.
arXiv Detail & Related papers (2023-05-29T11:54:50Z) - SenteCon: Leveraging Lexicons to Learn Human-Interpretable Language
Representations [51.08119762844217]
SenteCon is a method for introducing human interpretability in deep language representations.
We show that SenteCon provides high-level interpretability at little to no cost to predictive performance on downstream tasks.
arXiv Detail & Related papers (2023-05-24T05:06:28Z) - CLSE: Corpus of Linguistically Significant Entities [58.29901964387952]
We release a Corpus of Linguistically Significant Entities (CLSE) annotated by experts.
CLSE covers 74 different semantic types to support various applications from airline ticketing to video games.
We create a linguistically representative NLG evaluation benchmark in three languages: French, Marathi, and Russian.
arXiv Detail & Related papers (2022-11-04T12:56:12Z) - Annotation of Chinese Predicate Heads and Relevant Elements [20.427035216455366]
A predicate head is a verbal expression that plays a role as the structural center of a sentence.
This paper develops an annotation guideline for Chinese predicate heads and their relevant syntactic elements.
arXiv Detail & Related papers (2021-03-23T03:11:59Z) - On the Language Neutrality of Pre-trained Multilingual Representations [70.93503607755055]
We investigate the language-neutrality of multilingual contextual embeddings directly and with respect to lexical semantics.
Our results show that contextual embeddings are more language-neutral and, in general, more informative than aligned static word-type embeddings.
We show how to reach state-of-the-art accuracy on language identification and match the performance of statistical methods for word alignment of parallel sentences.
arXiv Detail & Related papers (2020-04-09T19:50:32Z) - Unique Chinese Linguistic Phenomena [4.020523898765406]
Linguistics holds unique characteristics of generality, stability, and nationality.
The diversities between Chinese and English linguistics are mainly reflected in morphology and syntax.
arXiv Detail & Related papers (2020-02-23T12:13:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.