Apurin\~a Universal Dependencies Treebank
- URL: http://arxiv.org/abs/2106.03391v1
- Date: Mon, 7 Jun 2021 07:42:00 GMT
- Title: Apurin\~a Universal Dependencies Treebank
- Authors: Jack Rueter, Mar\'ilia Fernanda Pereira de Freitas, Sidney da Silva
Facundes, Mika H\"am\"al\"ainen, Niko Partanen
- Abstract summary: This paper presents and discusses the first Universal Dependencies treebank for the Apurina language.
The treebank contains 76 fully annotated sentences, applies 14 parts-of-speech, as well as seven augmented or new features.
- Score: 0.4893345190925178
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents and discusses the first Universal Dependencies treebank
for the Apurin\~a language. The treebank contains 76 fully annotated sentences,
applies 14 parts-of-speech, as well as seven augmented or new features - some
of which are unique to Apurin\~a. The construction of the treebank has also
served as an opportunity to develop finite-state description of the language
and facilitate the transfer of open-source infrastructure possibilities to an
endangered language of the Amazon. The source materials used in the initial
treebank represent fieldwork practices where not all tokens of all sentences
are equally annotated. For this reason, establishing regular annotation
practices for the entire Apurin\~a treebank is an ongoing project.
Related papers
- MaiBaam: A Multi-Dialectal Bavarian Universal Dependency Treebank [56.810282574817414]
We present the first multi-dialect Bavarian treebank (MaiBaam) manually annotated with part-of-speech and syntactic dependency information in Universal Dependencies (UD)
We highlight the morphosyntactic differences between the closely-related Bavarian and German and showcase the rich variability of speakers' orthographies.
Our corpus includes 15k tokens, covering dialects from all Bavarian-speaking areas spanning three countries.
arXiv Detail & Related papers (2024-03-15T13:33:10Z) - Structured Dialogue Discourse Parsing [79.37200787463917]
discourse parsing aims to uncover the internal structure of a multi-participant conversation.
We propose a principled method that improves upon previous work from two perspectives: encoding and decoding.
Experiments show that our method achieves new state-of-the-art, surpassing the previous model by 2.3 on STAC and 1.5 on Molweni.
arXiv Detail & Related papers (2023-06-26T22:51:01Z) - RLET: A Reinforcement Learning Based Approach for Explainable QA with
Entailment Trees [47.745218107037786]
We propose RLET, a Reinforcement Learning based Entailment Tree generation framework.
RLET iteratively performs single step reasoning with sentence selection and deduction generation modules.
Experiments on three settings of the EntailmentBank dataset demonstrate the strength of using RL framework.
arXiv Detail & Related papers (2022-10-31T06:45:05Z) - Building an Endangered Language Resource in the Classroom: Universal
Dependencies for Kakataibo [0.8938910048099864]
We launch a new Universal Dependencies treebank for an endangered language from Amazonia: Kakataibo, a Panoan language spoken in Peru.
We first discuss the collaborative methodology implemented, which proved effective to create a treebank in the context of a Computational Linguistic course for undergraduates.
arXiv Detail & Related papers (2022-06-21T12:58:56Z) - Universal Dependency Treebank for Odia Language [0.24466725954625887]
This paper presents the first publicly available treebank of Odia, a morphologically rich low resource Indian language.
The treebank contains approx. 1082 tokens (100 sentences) in Odia selected from "Samantar", the largest available parallel corpora collection for Indic languages.
The morphological analysis of the Odia treebank was performed using machine learning techniques.
arXiv Detail & Related papers (2022-05-24T11:19:26Z) - LyS_ACoru\~na at SemEval-2022 Task 10: Repurposing Off-the-Shelf Tools
for Sentiment Analysis as Semantic Dependency Parsing [10.355938901584567]
This paper addresses the problem of structured sentiment analysis using a bi-affine semantic dependency.
For the monolingual setup, we considered: (i) training on a single treebank, and (ii) relaxing the setup by training on treebanks coming from different languages.
For the zero-shot setup and a given target treebank, we relied on: (i) a word-level translation of available treebanks in other languages to get noisy, unlikely-grammatical, but annotated data.
In the post-evaluation phase, we also trained cross-lingual models that simply merged all the English tree
arXiv Detail & Related papers (2022-04-27T10:21:28Z) - Developing Universal Dependency Treebanks for Magahi and Braj [0.7349727826230861]
In this paper, we discuss the development of treebanks for two low-resourced Indian languages - Magahi and Braj.
The Magahi treebank contains 945 sentences and Braj treebank around 500 sentences marked with their lemmas, part-of-speech, morphological features and universal dependencies.
arXiv Detail & Related papers (2022-04-26T23:43:41Z) - Incorporating Constituent Syntax for Coreference Resolution [50.71868417008133]
We propose a graph-based method to incorporate constituent syntactic structures.
We also explore to utilise higher-order neighbourhood information to encode rich structures in constituent trees.
Experiments on the English and Chinese portions of OntoNotes 5.0 benchmark show that our proposed model either beats a strong baseline or achieves new state-of-the-art performance.
arXiv Detail & Related papers (2022-02-22T07:40:42Z) - Strongly Incremental Constituency Parsing with Graph Neural Networks [70.16880251349093]
Parsing sentences into syntax trees can benefit downstream applications in NLP.
Transition-baseds build trees by executing actions in a state transition system.
Existing transition-baseds are predominantly based on the shift-reduce transition system.
arXiv Detail & Related papers (2020-10-27T19:19:38Z) - Span-based Semantic Parsing for Compositional Generalization [53.24255235340056]
SpanBasedSP predicts a span tree over an input utterance, explicitly encoding how partial programs compose over spans in the input.
On GeoQuery, SCAN and CLOSURE, SpanBasedSP performs similarly to strong seq2seq baselines on random splits, but dramatically improves performance compared to baselines on splits that require compositional generalization.
arXiv Detail & Related papers (2020-09-13T16:42:18Z) - Resources for Turkish Dependency Parsing: Introducing the BOUN Treebank
and the BoAT Annotation Tool [0.0]
We introduce the resources that we developed for Turkish dependency parsing, which include a novel manually annotated treebank (BOUN Treebank)
Decisions regarding the annotation of the BOUN Treebank were made in line with the Universal Dependencies (UD) framework.
We report the results of a state-of-the-art dependency annotation obtained over the BOUN Treebank as well as two other treebanks in Turkish.
arXiv Detail & Related papers (2020-02-24T17:59:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.