Annotation of Chinese Predicate Heads and Relevant Elements
- URL: http://arxiv.org/abs/2103.12280v1
- Date: Tue, 23 Mar 2021 03:11:59 GMT
- Title: Annotation of Chinese Predicate Heads and Relevant Elements
- Authors: Yanping Chen and Yongbin Qin and Ruizhang Huang and Qinghua Zheng and
Ping Chen
- Abstract summary: A predicate head is a verbal expression that plays a role as the structural center of a sentence.
This paper develops an annotation guideline for Chinese predicate heads and their relevant syntactic elements.
- Score: 20.427035216455366
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A predicate head is a verbal expression that plays a role as the structural
center of a sentence. Identifying predicate heads is critical to understanding
a sentence. It plays the leading role in organizing the relevant syntactic
elements in a sentence, including subject elements, adverbial elements, etc.
For some languages, such as English, word morphologies are valuable for
identifying predicate heads. However, Chinese offers no morphological
information to indicate words` grammatical roles. A Chinese sentence often
contains several verbal expressions; identifying the expression that plays the
role of the predicate head is not an easy task. Furthermore, Chinese sentences
are inattentive to structure and provide no delimitation between words.
Therefore, identifying Chinese predicate heads involves significant challenges.
In Chinese information extraction, little work has been performed in predicate
head recognition. No generally accepted evaluation dataset supports work in
this important area. This paper presents the first attempt to develop an
annotation guideline for Chinese predicate heads and their relevant syntactic
elements. This annotation guideline emphasizes the role of the predicate as the
structural center of a sentence. The design of relevant syntactic element
annotation also follows this principle. Many considerations are proposed to
achieve this goal, e.g., patterns of predicate heads, a flattened annotation
structure, and a simpler syntactic unit type. Based on the proposed annotation
guideline, more than 1,500 documents were manually annotated. The corpus will
be available online for public access. With this guideline and annotated
corpus, our goal is to broadly impact and advance the research in the area of
Chinese information extraction and to provide the research community with a
critical resource that has been lacking for a long time.
Related papers
- Surprise! Uniform Information Density Isn't the Whole Story: Predicting Surprisal Contours in Long-form Discourse [54.08750245737734]
We propose that speakers modulate information rate based on location within a hierarchically-structured model of discourse.
We find that hierarchical predictors are significant predictors of a discourse's information contour and that deeply nested hierarchical predictors are more predictive than shallow ones.
arXiv Detail & Related papers (2024-10-21T14:42:37Z) - Quantifying the redundancy between prosody and text [67.07817268372743]
We use large language models to estimate how much information is redundant between prosody and the words themselves.
We find a high degree of redundancy between the information carried by the words and prosodic information across several prosodic features.
Still, we observe that prosodic features can not be fully predicted from text, suggesting that prosody carries information above and beyond the words.
arXiv Detail & Related papers (2023-11-28T21:15:24Z) - Is Argument Structure of Learner Chinese Understandable: A Corpus-Based
Analysis [8.883799596036484]
This paper presents a corpus-based analysis of argument structure errors in learner Chinese.
The data for analysis includes sentences produced by language learners as well as their corrections by native speakers.
We couple the data with semantic role labeling annotations that are manually created by two senior students.
arXiv Detail & Related papers (2023-08-17T21:10:04Z) - PropSegmEnt: A Large-Scale Corpus for Proposition-Level Segmentation and
Entailment Recognition [63.51569687229681]
We argue for the need to recognize the textual entailment relation of each proposition in a sentence individually.
We propose PropSegmEnt, a corpus of over 45K propositions annotated by expert human raters.
Our dataset structure resembles the tasks of (1) segmenting sentences within a document to the set of propositions, and (2) classifying the entailment relation of each proposition with respect to a different yet topically-aligned document.
arXiv Detail & Related papers (2022-12-21T04:03:33Z) - Discourse Context Predictability Effects in Hindi Word Order [14.88833412862455]
We investigate how the words and syntactic structures in a sentence influence the word order of the following sentences.
We use a number of discourse-based features and cognitive features to make its predictions, including dependency length, surprisal, and information status.
We find that information status and LSTM-based discourse predictability influence word order choices, especially for non-canonical object-fronted orders.
arXiv Detail & Related papers (2022-10-25T11:53:01Z) - Teacher Perception of Automatically Extracted Grammar Concepts for L2
Language Learning [91.49622922938681]
We present an automatic framework that automatically discovers and visualizing descriptions of different aspects of grammar.
Specifically, we extract descriptions from a natural text corpus that answer questions about morphosyntax and semantics.
We apply this method for teaching the Indian languages, Kannada and Marathi, which, unlike English, do not have well-developed pedagogical resources.
arXiv Detail & Related papers (2022-06-10T14:52:22Z) - Representing `how you say' with `what you say': English corpus of
focused speech and text reflecting corresponding implications [10.103202030679844]
In speech communication, how something is said (paralinguistic information) is as crucial as what is said (linguistic information)
Current speech translation systems return the same translations if the utterances are linguistically identical.
We propose mapping paralinguistic information into the linguistic domain within the source language using lexical and grammatical devices.
arXiv Detail & Related papers (2022-03-29T12:29:22Z) - AUTOLEX: An Automatic Framework for Linguistic Exploration [93.89709486642666]
We propose an automatic framework that aims to ease linguists' discovery and extraction of concise descriptions of linguistic phenomena.
Specifically, we apply this framework to extract descriptions for three phenomena: morphological agreement, case marking, and word order.
We evaluate the descriptions with the help of language experts and propose a method for automated evaluation when human evaluation is infeasible.
arXiv Detail & Related papers (2022-03-25T20:37:30Z) - An In-depth Study on Internal Structure of Chinese Words [34.864343591706984]
This work proposes to model the deep internal structures of Chinese words as dependency trees with 11 labels for distinguishing syntactic relationships.
We manually annotate a word-internal structure treebank (WIST) consisting of over 30K multi-char words from Chinese Penn Treebank.
We present detailed and interesting analysis on WIST to reveal insights on Chinese word formation.
arXiv Detail & Related papers (2021-06-01T09:09:51Z) - Do Context-Aware Translation Models Pay the Right Attention? [61.25804242929533]
Context-aware machine translation models are designed to leverage contextual information, but often fail to do so.
In this paper, we ask several questions: What contexts do human translators use to resolve ambiguous words?
We introduce SCAT (Supporting Context for Ambiguous Translations), a new English-French dataset comprising supporting context words for 14K translations.
Using SCAT, we perform an in-depth analysis of the context used to disambiguate, examining positional and lexical characteristics of the supporting words.
arXiv Detail & Related papers (2021-05-14T17:32:24Z) - A Corpus of Adpositional Supersenses for Mandarin Chinese [15.757892250956715]
This paper presents a corpus in which all adpositions have been semantically annotated in Mandarin Chinese.
Our approach adapts a framework that defined a general set of supersenses according to ostensibly language-independent semantic criteria.
We find that the supersense categories are well-suited to Chinese adpositions despite syntactic differences from English.
arXiv Detail & Related papers (2020-03-18T18:59:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.