Is Argument Structure of Learner Chinese Understandable: A Corpus-Based
Analysis
- URL: http://arxiv.org/abs/2308.09186v1
- Date: Thu, 17 Aug 2023 21:10:04 GMT
- Title: Is Argument Structure of Learner Chinese Understandable: A Corpus-Based
Analysis
- Authors: Yuguang Duan, Zi Lin, Weiwei Sun
- Abstract summary: This paper presents a corpus-based analysis of argument structure errors in learner Chinese.
The data for analysis includes sentences produced by language learners as well as their corrections by native speakers.
We couple the data with semantic role labeling annotations that are manually created by two senior students.
- Score: 8.883799596036484
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a corpus-based analysis of argument structure errors in
learner Chinese. The data for analysis includes sentences produced by language
learners as well as their corrections by native speakers. We couple the data
with semantic role labeling annotations that are manually created by two senior
students whose majors are both Applied Linguistics. The annotation procedure is
guided by the Chinese PropBank specification, which is originally developed to
cover first language phenomena. Nevertheless, we find that it is quite
comprehensive for handling second language phenomena. The inter-annotator
agreement is rather high, suggesting the understandability of learner texts to
native speakers. Based on our annotations, we present a preliminary analysis of
competence errors related to argument structure. In particular, speech errors
related to word order, word selection, lack of proposition, and
argument-adjunct confounding are discussed.
Related papers
- To Drop or Not to Drop? Predicting Argument Ellipsis Judgments: A Case Study in Japanese [26.659122101710068]
We study whether and why a particular argument should be omitted across over 2,000 data points in the balanced corpus of Japanese.
The data indicate that native speakers overall share common criteria for such judgments.
The gap between the systems' prediction and human judgments in specific linguistic aspects is revealed.
arXiv Detail & Related papers (2024-04-17T12:26:52Z) - Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language
Modelling [70.23876429382969]
We propose a benchmark that can evaluate intra-sentence discourse properties across a diverse set of NLP tasks.
Disco-Bench consists of 9 document-level testsets in the literature domain, which contain rich discourse phenomena.
For linguistic analysis, we also design a diagnostic test suite that can examine whether the target models learn discourse knowledge.
arXiv Detail & Related papers (2023-07-16T15:18:25Z) - Natural Language Decompositions of Implicit Content Enable Better Text
Representations [56.85319224208865]
We introduce a method for the analysis of text that takes implicitly communicated content explicitly into account.
We use a large language model to produce sets of propositions that are inferentially related to the text that has been observed.
Our results suggest that modeling the meanings behind observed language, rather than the literal text alone, is a valuable direction for NLP.
arXiv Detail & Related papers (2023-05-23T23:45:20Z) - CLSE: Corpus of Linguistically Significant Entities [58.29901964387952]
We release a Corpus of Linguistically Significant Entities (CLSE) annotated by experts.
CLSE covers 74 different semantic types to support various applications from airline ticketing to video games.
We create a linguistically representative NLG evaluation benchmark in three languages: French, Marathi, and Russian.
arXiv Detail & Related papers (2022-11-04T12:56:12Z) - A Linguistic Investigation of Machine Learning based Contradiction
Detection Models: An Empirical Analysis and Future Perspectives [0.34998703934432673]
We analyze two Natural Language Inference data sets with respect to their linguistic features.
The goal is to identify those syntactic and semantic properties that are particularly hard to comprehend for a machine learning model.
arXiv Detail & Related papers (2022-10-19T10:06:03Z) - Improving Pre-trained Language Models with Syntactic Dependency
Prediction Task for Chinese Semantic Error Recognition [52.55136323341319]
Existing Chinese text error detection mainly focuses on spelling and simple grammatical errors.
Chinese semantic errors are understudied and more complex that humans cannot easily recognize.
arXiv Detail & Related papers (2022-04-15T13:55:32Z) - AUTOLEX: An Automatic Framework for Linguistic Exploration [93.89709486642666]
We propose an automatic framework that aims to ease linguists' discovery and extraction of concise descriptions of linguistic phenomena.
Specifically, we apply this framework to extract descriptions for three phenomena: morphological agreement, case marking, and word order.
We evaluate the descriptions with the help of language experts and propose a method for automated evaluation when human evaluation is infeasible.
arXiv Detail & Related papers (2022-03-25T20:37:30Z) - Controlled Evaluation of Grammatical Knowledge in Mandarin Chinese
Language Models [22.57309958548928]
We investigate whether structural supervision improves language models' ability to learn grammatical dependencies in typologically different languages.
We train LSTMs, Recurrent Neural Network Grammars, Transformer language models, and generative parsing models on datasets of different sizes.
We find suggestive evidence that structural supervision helps with representing syntactic state across intervening content and improves performance in low-data settings.
arXiv Detail & Related papers (2021-09-22T22:11:30Z) - A Massively Multilingual Analysis of Cross-linguality in Shared
Embedding Space [61.18554842370824]
In cross-lingual language models, representations for many different languages live in the same space.
We compute a task-based measure of cross-lingual alignment in the form of bitext retrieval performance.
We examine a range of linguistic, quasi-linguistic, and training-related features as potential predictors of these alignment metrics.
arXiv Detail & Related papers (2021-09-13T21:05:37Z) - Multilingual Neural RST Discourse Parsing [24.986030179701405]
We investigate two approaches to establish a neural, cross-lingual discourse via multilingual vector representations and segment-level translation.
Experiment results show that both methods are effective even with limited training data, and achieve state-of-the-art performance on cross-lingual, document-level discourse parsing.
arXiv Detail & Related papers (2020-12-03T05:03:38Z) - A Linguistic Analysis of Visually Grounded Dialogues Based on Spatial
Expressions [35.24301299033675]
We propose a framework for investigating fine-grained language understanding in visually grounded dialogues.
We focus on OneCommon Corpus citepudagawa 2019natural,udagawa 2020annotated, a simple yet challenging common grounding dataset.
We analyze their linguistic structures based on textitspatial expressions and provide comprehensive and reliable annotation for 600 dialogues.
arXiv Detail & Related papers (2020-10-07T02:50:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.