Is Japanese CCGBank empirically correct? A case study of passive and
causative constructions
- URL: http://arxiv.org/abs/2302.14708v1
- Date: Tue, 28 Feb 2023 16:19:24 GMT
- Title: Is Japanese CCGBank empirically correct? A case study of passive and
causative constructions
- Authors: Daisuke Bekki and Hitomi Yanaka
- Abstract summary: We focus on the analysis of passive/causative constructions in the Japanese CCGBank.
We show that, together with the compositional semantics of ccg2lambda, a semantic parsing system, it yields empirically wrong predictions for the nested construction of passives and causatives.
- Score: 18.021287677546958
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Japanese CCGBank serves as training and evaluation data for developing
Japanese CCG parsers. However, since it is automatically generated from the
Kyoto Corpus, a dependency treebank, its linguistic validity still needs to be
sufficiently verified. In this paper, we focus on the analysis of
passive/causative constructions in the Japanese CCGBank and show that, together
with the compositional semantics of ccg2lambda, a semantic parsing system, it
yields empirically wrong predictions for the nested construction of passives
and causatives.
Related papers
- Computational Semantics and Evaluation Benchmark for Interrogative
Sentences via Combinatory Categorial Grammar [8.666172545138275]
We present a compositional semantics for various types of polar questions and wh-questions within the framework of Combinatory Categorial Grammar (CCG)
We introduce a question-answering dataset QSEM specifically designed to evaluate the semantics of interrogative sentences.
arXiv Detail & Related papers (2023-12-22T14:46:02Z) - FlaCGEC: A Chinese Grammatical Error Correction Dataset with
Fine-grained Linguistic Annotation [11.421545095092815]
FlaCGEC is a new CGEC dataset featured with fine-grained linguistic annotation.
We collect raw corpus from the linguistic schema defined by Chinese language experts, conduct edits on sentences via rules, and refine generated samples manually.
We evaluate various cutting-edge CGEC methods on the proposed FlaCGEC dataset and their unremarkable results indicate that this dataset is challenging in covering a large range of grammatical errors.
arXiv Detail & Related papers (2023-09-26T10:22:43Z) - CSED: A Chinese Semantic Error Diagnosis Corpus [52.92010408053424]
We study the complicated problem of Chinese Semantic Error Diagnosis (CSED), which lacks relevant datasets.
The study of semantic errors is important because they are very common and may lead to syntactic irregularities or even problems of comprehension.
This paper proposes syntax-aware models to specifically adapt to the CSED task.
arXiv Detail & Related papers (2023-05-09T05:33:31Z) - Improving Chinese Spelling Check by Character Pronunciation Prediction:
The Effects of Adaptivity and Granularity [76.20568599642799]
Chinese spelling check (CSC) is a fundamental NLP task that detects and corrects spelling errors in Chinese texts.
In this paper, we consider introducing an auxiliary task of Chinese pronunciation prediction ( CPP) to improve CSC.
We propose SCOPE which builds on top of a shared encoder two parallel decoders, one for the primary CSC task and the other for a fine-grained auxiliary CPP task.
arXiv Detail & Related papers (2022-10-20T03:42:35Z) - CGELBank: CGEL as a Framework for English Syntax Annotation [11.042037758273226]
We introduce the syntactic formalism of the textitCambridge Grammar of the English Language (CGEL) to the world of treebanking through the CGELBank project.
We discuss some issues in linguistic analysis that arose in adapting the formalism to corpus annotation, followed by quantitative and qualitative comparisons with parallel UD and PTB treebanks.
arXiv Detail & Related papers (2022-10-01T23:44:06Z) - MuCGEC: a Multi-Reference Multi-Source Evaluation Dataset for Chinese
Grammatical Error Correction [51.3754092853434]
MuCGEC is a multi-reference evaluation dataset for Chinese Grammatical Error Correction (CGEC)
It consists of 7,063 sentences collected from three different Chinese-as-a-Second-Language (CSL) learner sources.
Each sentence has been corrected by three annotators, and their corrections are meticulously reviewed by an expert, resulting in 2.3 references per sentence.
arXiv Detail & Related papers (2022-04-23T05:20:38Z) - Improving Pre-trained Language Models with Syntactic Dependency
Prediction Task for Chinese Semantic Error Recognition [52.55136323341319]
Existing Chinese text error detection mainly focuses on spelling and simple grammatical errors.
Chinese semantic errors are understudied and more complex that humans cannot easily recognize.
arXiv Detail & Related papers (2022-04-15T13:55:32Z) - Semantic Construction Grammar: Bridging the NL / Logic Divide [3.245535754791156]
We discuss Semantic Construction Grammar (SCG), a system developed to facilitate translation between natural language and logical representations.
SCG is designed to support a variety of different methods of representation, ranging from those that are fairly close to the NL structure to those that are quite different from the NL structure.
arXiv Detail & Related papers (2021-12-10T00:02:40Z) - A Syntax-Guided Grammatical Error Correction Model with Dependency Tree
Correction [83.14159143179269]
Grammatical Error Correction (GEC) is a task of detecting and correcting grammatical errors in sentences.
We propose a syntax-guided GEC model (SG-GEC) which adopts the graph attention mechanism to utilize the syntactic knowledge of dependency trees.
We evaluate our model on public benchmarks of GEC task and it achieves competitive results.
arXiv Detail & Related papers (2021-11-05T07:07:48Z) - Co-training an Unsupervised Constituency Parser with Weak Supervision [33.63314110665062]
We introduce a method for unsupervised parsing that relies on bootstrapping classifiers to identify if a node dominates a specific span in a sentence.
We show that the interplay between them helps improve the accuracy of both, and as a result, effectively parse.
arXiv Detail & Related papers (2021-10-05T18:45:06Z) - Constructing a Family Tree of Ten Indo-European Languages with
Delexicalized Cross-linguistic Transfer Patterns [57.86480614673034]
We formalize the delexicalized transfer as interpretable tree-to-string and tree-to-tree patterns.
This allows us to quantitatively probe cross-linguistic transfer and extend inquiries of Second Language Acquisition.
arXiv Detail & Related papers (2020-07-17T15:56:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.