Penn-Helsinki Parsed Corpus of Early Modern English: First Parsing
Results and Analysis
- URL: http://arxiv.org/abs/2112.08532v1
- Date: Wed, 15 Dec 2021 23:56:21 GMT
- Title: Penn-Helsinki Parsed Corpus of Early Modern English: First Parsing
Results and Analysis
- Authors: Seth Kulick, Neville Ryant, Beatrice Santorini
- Abstract summary: We present the first parsing results on the Penn-Helsinki Parsed Corpus of Early Modern English (PPCEME), a 1.9 million word treebank.
We describe key features of PPCEME that make it challenging for parsing, including a larger and more varied set of function tags than in the Penn Treebank.
- Score: 2.8749014299466444
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present the first parsing results on the Penn-Helsinki Parsed Corpus of
Early Modern English (PPCEME), a 1.9 million word treebank that is an important
resource for research in syntactic change. We describe key features of PPCEME
that make it challenging for parsing, including a larger and more varied set of
function tags than in the Penn Treebank. We present results for this corpus
using a modified version of the Berkeley Neural Parser and the approach to
function tag recovery of Gabbard et al (2006). Despite its simplicity, this
approach works surprisingly well, suggesting it is possible to recover the
original structure with sufficient accuracy to support linguistic applications
(e.g., searching for syntactic structures of interest). However, for a subset
of function tags (e.g., the tag indicating direct speech), additional work is
needed, and we discuss some further limits of this approach. The resulting
parser will be used to parse Early English Books Online, a 1.1 billion word
corpus whose utility for the study of syntactic change will be greatly
increased with the addition of accurate parse trees.
Related papers
- Integrating Supertag Features into Neural Discontinuous Constituent Parsing [0.0]
Traditional views of constituency demand that constituents consist of adjacent words, common in languages like German.
Transition-based parsing produces trees given raw text input using supervised learning on large annotated corpora.
arXiv Detail & Related papers (2024-10-11T12:28:26Z) - Urdu Dependency Parsing and Treebank Development: A Syntactic and Morphological Perspective [0.0]
We use dependency parsing to analyze news articles in Urdu.
We achieve a best-labeled accuracy (LA) of 70% and an unlabeled attachment score (UAS) of 84%.
arXiv Detail & Related papers (2024-06-13T19:30:32Z) - MaiBaam: A Multi-Dialectal Bavarian Universal Dependency Treebank [56.810282574817414]
We present the first multi-dialect Bavarian treebank (MaiBaam) manually annotated with part-of-speech and syntactic dependency information in Universal Dependencies (UD)
We highlight the morphosyntactic differences between the closely-related Bavarian and German and showcase the rich variability of speakers' orthographies.
Our corpus includes 15k tokens, covering dialects from all Bavarian-speaking areas spanning three countries.
arXiv Detail & Related papers (2024-03-15T13:33:10Z) - Syntactic Language Change in English and German: Metrics, Parsers, and Convergences [56.47832275431858]
The current paper looks at diachronic trends in syntactic language change in both English and German, using corpora of parliamentary debates from the last c. 160 years.
We base our observations on five dependencys, including the widely used Stanford Core as well as 4 newer alternatives.
We show that changes in syntactic measures seem to be more frequent at the tails of sentence length distributions.
arXiv Detail & Related papers (2024-02-18T11:46:16Z) - Hexatagging: Projective Dependency Parsing as Tagging [63.5392760743851]
We introduce a novel dependency, the hexatagger, that constructs dependency trees by tagging the words in a sentence with elements from a finite set of possible tags.
Our approach is fully parallelizable at training time, i.e., the structure-building actions needed to build a dependency parse can be predicted in parallel to each other.
We achieve state-of-the-art performance of 96.4 LAS and 97.4 UAS on the Penn Treebank test set.
arXiv Detail & Related papers (2023-06-08T18:02:07Z) - A Biologically Plausible Parser [1.8563342761346613]
We describe a of English effectuated by biologically plausible neurons and synapses.
We demonstrate that this device is capable of correctly parsing reasonably nontrivial sentences.
arXiv Detail & Related papers (2021-08-04T17:27:06Z) - Strongly Incremental Constituency Parsing with Graph Neural Networks [70.16880251349093]
Parsing sentences into syntax trees can benefit downstream applications in NLP.
Transition-baseds build trees by executing actions in a state transition system.
Existing transition-baseds are predominantly based on the shift-reduce transition system.
arXiv Detail & Related papers (2020-10-27T19:19:38Z) - A Survey of Unsupervised Dependency Parsing [62.16714720135358]
Unsupervised dependency parsing aims to learn a dependency from sentences that have no annotation of their correct parse trees.
Despite its difficulty, unsupervised parsing is an interesting research direction because of its capability of utilizing almost unlimited unannotated text data.
arXiv Detail & Related papers (2020-10-04T10:51:22Z) - A Survey of Syntactic-Semantic Parsing Based on Constituent and
Dependency Structures [14.714725860010724]
We focus on two of the most popular formalizations of parsing: constituent parsing and dependency parsing.
This article briefly reviews the representative models of constituent parsing and dependency parsing, and also dependency parsing with rich semantics.
arXiv Detail & Related papers (2020-06-19T10:21:17Z) - A Tale of a Probe and a Parser [74.14046092181947]
Measuring what linguistic information is encoded in neural models of language has become popular in NLP.
Researchers approach this enterprise by training "probes" - supervised models designed to extract linguistic structure from another model's output.
One such probe is the structural probe, designed to quantify the extent to which syntactic information is encoded in contextualised word representations.
arXiv Detail & Related papers (2020-05-04T16:57:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.