Enhancements to the BOUN Treebank Reflecting the Agglutinative Nature of
Turkish
- URL: http://arxiv.org/abs/2207.11782v1
- Date: Sun, 24 Jul 2022 17:56:27 GMT
- Title: Enhancements to the BOUN Treebank Reflecting the Agglutinative Nature of
Turkish
- Authors: B\"u\c{s}ra Mar\c{s}an, Salih Furkan Akkurt, Muhammet \c{S}en, Merve
G\"urb\"uz, Onur G\"ung\"or, \c{S}aziye Bet\"ul \"Ozate\c{s}, Suzan
\"Usk\"udarl{\i}, Arzucan \"Ozg\"ur, Tunga G\"ung\"or, Balk{\i}z \"Ozt\"urk
- Abstract summary: We aim to resolve the issues of the lack of representation of null morphemes, highly productive derivational processes, and syncretic morphemes of Turkish in the BOUN Treebank without diverging from the Universal Dependencies framework.
New annotation conventions were introduced by splitting certain lemmas and employing the MISC (miscellaneous) tab in the UD framework to denote derivation.
Representational capabilities of the re-annotated treebank were tested on a LSTM-based dependency and an updated version of the BoAT Tool is introduced.
- Score: 0.6514569292630354
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this study, we aim to offer linguistically motivated solutions to resolve
the issues of the lack of representation of null morphemes, highly productive
derivational processes, and syncretic morphemes of Turkish in the BOUN Treebank
without diverging from the Universal Dependencies framework.
In order to tackle these issues, new annotation conventions were introduced
by splitting certain lemmas and employing the MISC (miscellaneous) tab in the
UD framework to denote derivation. Representational capabilities of the
re-annotated treebank were tested on a LSTM-based dependency parser and an
updated version of the BoAT Tool is introduced.
Related papers
- Thai Universal Dependency Treebank [0.0]
We introduce Thai Universal Dependency Treebank (TUD), a new largest Thai treebank consisting of 3,627 trees annotated in accordance with the Universal Dependencies (UD) framework.
We then benchmark dependency parsing models that incorporate pretrained encoders and train them on Thai-PUD and our TUD.
The results show that most of our models can outperform other models reported in previous papers and provide insight into the optimal choices of components in Thai dependencys.
arXiv Detail & Related papers (2024-05-13T09:48:13Z) - Negation Triplet Extraction with Syntactic Dependency and Semantic Consistency [37.99421732397288]
SSENE is built based on a generative pretrained language model (PLM) of-Decoder architecture with a multi-task learning framework.
We have constructed a high-quality Chinese dataset NegComment based on the users' reviews from the real-world platform of Meituan.
arXiv Detail & Related papers (2024-04-15T14:28:33Z) - FENICE: Factuality Evaluation of summarization based on Natural language Inference and Claim Extraction [85.26780391682894]
We propose Factuality Evaluation of summarization based on Natural language Inference and Claim Extraction (FENICE)
FENICE leverages an NLI-based alignment between information in the source document and a set of atomic facts, referred to as claims, extracted from the summary.
Our metric sets a new state of the art on AGGREFACT, the de-facto benchmark for factuality evaluation.
arXiv Detail & Related papers (2024-03-04T17:57:18Z) - Enhancing Systematic Decompositional Natural Language Inference Using Informal Logic [51.967603572656266]
We introduce a consistent and theoretically grounded approach to annotating decompositional entailment.
We find that our new dataset, RDTE, has a substantially higher internal consistency (+9%) than prior decompositional entailment datasets.
We also find that training an RDTE-oriented entailment classifier via knowledge distillation and employing it in an entailment tree reasoning engine significantly improves both accuracy and proof quality.
arXiv Detail & Related papers (2024-02-22T18:55:17Z) - Dependency Annotation of Ottoman Turkish with Multilingual BERT [0.0]
This study introduces a pretrained large language model-based annotation methodology for the first dency treebank in Ottoman Turkish.
The resulting treebank will facilitate automated analysis of Ottoman Turkish documents, unlocking the linguistic richness embedded in this historical heritage.
arXiv Detail & Related papers (2024-02-22T17:58:50Z) - Injecting linguistic knowledge into BERT for Dialogue State Tracking [60.42231674887294]
This paper proposes a method that extracts linguistic knowledge via an unsupervised framework.
We then utilize this knowledge to augment BERT's performance and interpretability in Dialogue State Tracking (DST) tasks.
We benchmark this framework on various DST tasks and observe a notable improvement in accuracy.
arXiv Detail & Related papers (2023-11-27T08:38:42Z) - Constructing Code-mixed Universal Dependency Forest for Unbiased
Cross-lingual Relation Extraction [92.84968716013783]
Cross-lingual relation extraction (XRE) aggressively leverage the language-consistent structural features from the universal dependency (UD) resource.
We investigate an unbiased UD-based XRE transfer by constructing a type of code-mixed UD forest.
With such forest features, the gaps of UD-based XRE between the training and predicting phases can be effectively closed.
arXiv Detail & Related papers (2023-05-20T18:24:06Z) - CGELBank: CGEL as a Framework for English Syntax Annotation [11.042037758273226]
We introduce the syntactic formalism of the textitCambridge Grammar of the English Language (CGEL) to the world of treebanking through the CGELBank project.
We discuss some issues in linguistic analysis that arose in adapting the formalism to corpus annotation, followed by quantitative and qualitative comparisons with parallel UD and PTB treebanks.
arXiv Detail & Related papers (2022-10-01T23:44:06Z) - Treebanking User-Generated Content: a UD Based Overview of Guidelines,
Corpora and Unified Recommendations [58.50167394354305]
This article presents a discussion on the main linguistic phenomena which cause difficulties in the analysis of user-generated texts found on the web and in social media.
It proposes a set of tentative UD-based annotation guidelines to promote consistent treatment of the particular phenomena found in these types of texts.
arXiv Detail & Related papers (2020-11-03T23:34:42Z) - Reference Language based Unsupervised Neural Machine Translation [108.64894168968067]
unsupervised neural machine translation almost completely relieves the parallel corpus curse.
We propose a new reference language-based framework for UNMT, RUNMT, in which the reference language only shares a parallel corpus with the source.
Experimental results show that our methods improve the quality of UNMT over that of a strong baseline that uses only one auxiliary language.
arXiv Detail & Related papers (2020-04-05T08:28:08Z) - Resources for Turkish Dependency Parsing: Introducing the BOUN Treebank
and the BoAT Annotation Tool [0.0]
We introduce the resources that we developed for Turkish dependency parsing, which include a novel manually annotated treebank (BOUN Treebank)
Decisions regarding the annotation of the BOUN Treebank were made in line with the Universal Dependencies (UD) framework.
We report the results of a state-of-the-art dependency annotation obtained over the BOUN Treebank as well as two other treebanks in Turkish.
arXiv Detail & Related papers (2020-02-24T17:59:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.