CGELBank: CGEL as a Framework for English Syntax Annotation
- URL: http://arxiv.org/abs/2210.00394v1
- Date: Sat, 1 Oct 2022 23:44:06 GMT
- Title: CGELBank: CGEL as a Framework for English Syntax Annotation
- Authors: Brett Reynolds, Aryaman Arora, Nathan Schneider
- Abstract summary: We introduce the syntactic formalism of the textitCambridge Grammar of the English Language (CGEL) to the world of treebanking through the CGELBank project.
We discuss some issues in linguistic analysis that arose in adapting the formalism to corpus annotation, followed by quantitative and qualitative comparisons with parallel UD and PTB treebanks.
- Score: 11.042037758273226
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce the syntactic formalism of the \textit{Cambridge Grammar of the
English Language} (CGEL) to the world of treebanking through the CGELBank
project. We discuss some issues in linguistic analysis that arose in adapting
the formalism to corpus annotation, followed by quantitative and qualitative
comparisons with parallel UD and PTB treebanks. We argue that CGEL provides a
good tradeoff between comprehensiveness of analysis and usability for
annotation, which motivates expanding the treebank with automatic conversion in
the future.
Related papers
- Scalable Frame-based Construction of Sociocultural NormBases for Socially-Aware Dialogues [66.69453609603875]
Sociocultural norms serve as guiding principles for personal conduct in social interactions.
We propose a scalable approach for constructing a Sociocultural Norm (SCN) Base using Large Language Models (LLMs)
We construct a comprehensive and publicly accessible Chinese Sociocultural NormBase.
arXiv Detail & Related papers (2024-10-04T00:08:46Z) - Multi-perspective Improvement of Knowledge Graph Completion with Large
Language Models [95.31941227776711]
We propose MPIKGC to compensate for the deficiency of contextualized knowledge and improve KGC by querying large language models (LLMs)
We conducted extensive evaluation of our framework based on four description-based KGC models and four datasets, for both link prediction and triplet classification tasks.
arXiv Detail & Related papers (2024-03-04T12:16:15Z) - CGELBank Annotation Manual v1.1 [8.78380676369991]
CGELBank is a treebank and associated tools based on a syntactic formalism for English derived from the Cambridge Grammar of the English Language.
This document lays out the particularities of the CGELBank annotation scheme.
arXiv Detail & Related papers (2023-05-27T03:01:53Z) - Is Japanese CCGBank empirically correct? A case study of passive and
causative constructions [18.021287677546958]
We focus on the analysis of passive/causative constructions in the Japanese CCGBank.
We show that, together with the compositional semantics of ccg2lambda, a semantic parsing system, it yields empirically wrong predictions for the nested construction of passives and causatives.
arXiv Detail & Related papers (2023-02-28T16:19:24Z) - Multilingual Extraction and Categorization of Lexical Collocations with
Graph-aware Transformers [86.64972552583941]
We put forward a sequence tagging BERT-based model enhanced with a graph-aware transformer architecture, which we evaluate on the task of collocation recognition in context.
Our results suggest that explicitly encoding syntactic dependencies in the model architecture is helpful, and provide insights on differences in collocation typification in English, Spanish and French.
arXiv Detail & Related papers (2022-05-23T16:47:37Z) - LyS_ACoru\~na at SemEval-2022 Task 10: Repurposing Off-the-Shelf Tools
for Sentiment Analysis as Semantic Dependency Parsing [10.355938901584567]
This paper addresses the problem of structured sentiment analysis using a bi-affine semantic dependency.
For the monolingual setup, we considered: (i) training on a single treebank, and (ii) relaxing the setup by training on treebanks coming from different languages.
For the zero-shot setup and a given target treebank, we relied on: (i) a word-level translation of available treebanks in other languages to get noisy, unlikely-grammatical, but annotated data.
In the post-evaluation phase, we also trained cross-lingual models that simply merged all the English tree
arXiv Detail & Related papers (2022-04-27T10:21:28Z) - From Sentiment Annotations to Sentiment Prediction through Discourse
Augmentation [30.615883375573432]
We propose a novel framework to exploit task-related discourse for the task of sentiment analysis.
More specifically, we are combining the large-scale, sentiment-dependent MEGA-DT treebank with a novel neural architecture for sentiment prediction.
Experiments show that our framework using sentiment-related discourse augmentations for sentiment prediction enhances the overall performance for long documents.
arXiv Detail & Related papers (2020-11-05T18:28:13Z) - MEGA RST Discourse Treebanks with Structure and Nuclearity from Scalable
Distant Sentiment Supervision [30.615883375573432]
We present a novel methodology to automatically generate discourse treebanks using distant supervision from sentiment-annotated datasets.
Our approach generates trees incorporating structure and nuclearity for documents of arbitrary length by relying on an efficient beam-search strategy.
Experiments indicate that a discourse trained on our MEGA-DT treebank delivers promising inter-domain performance gains.
arXiv Detail & Related papers (2020-11-05T18:22:38Z) - Treebanking User-Generated Content: a UD Based Overview of Guidelines,
Corpora and Unified Recommendations [58.50167394354305]
This article presents a discussion on the main linguistic phenomena which cause difficulties in the analysis of user-generated texts found on the web and in social media.
It proposes a set of tentative UD-based annotation guidelines to promote consistent treatment of the particular phenomena found in these types of texts.
arXiv Detail & Related papers (2020-11-03T23:34:42Z) - Recursive Top-Down Production for Sentence Generation with Latent Trees [77.56794870399288]
We model the production property of context-free grammars for natural and synthetic languages.
We present a dynamic programming algorithm that marginalises over latent binary tree structures with $N$ leaves.
We also present experimental results on German-English translation on the Multi30k dataset.
arXiv Detail & Related papers (2020-10-09T17:47:16Z) - Multilingual Alignment of Contextual Word Representations [49.42244463346612]
BERT exhibits significantly improved zero-shot performance on XNLI compared to the base model.
We introduce a contextual version of word retrieval and show that it correlates well with downstream zero-shot transfer.
These results support contextual alignment as a useful concept for understanding large multilingual pre-trained models.
arXiv Detail & Related papers (2020-02-10T03:27:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.