Building an Endangered Language Resource in the Classroom: Universal
Dependencies for Kakataibo
- URL: http://arxiv.org/abs/2206.10343v1
- Date: Tue, 21 Jun 2022 12:58:56 GMT
- Title: Building an Endangered Language Resource in the Classroom: Universal
Dependencies for Kakataibo
- Authors: Roberto Zariquiey, Claudia Alvarado, Ximena Echevarria, Luisa Gomez,
Rosa Gonzales, Mariana Illescas, Sabina Oporto, Frederic Blum, Arturo
Oncevay, Javier Vera
- Abstract summary: We launch a new Universal Dependencies treebank for an endangered language from Amazonia: Kakataibo, a Panoan language spoken in Peru.
We first discuss the collaborative methodology implemented, which proved effective to create a treebank in the context of a Computational Linguistic course for undergraduates.
- Score: 0.8938910048099864
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we launch a new Universal Dependencies treebank for an
endangered language from Amazonia: Kakataibo, a Panoan language spoken in Peru.
We first discuss the collaborative methodology implemented, which proved
effective to create a treebank in the context of a Computational Linguistic
course for undergraduates. Then, we describe the general details of the
treebank and the language-specific considerations implemented for the proposed
annotation. We finally conduct some experiments on part-of-speech tagging and
syntactic dependency parsing. We focus on monolingual and transfer learning
settings, where we study the impact of a Shipibo-Konibo treebank, another
Panoan language resource.
Related papers
- Building a Language-Learning Game for Brazilian Indigenous Languages: A Case of Study [0.0]
We describe a process to automatically generate language exercises and questions from a dependency treebank and a lexical database for Tupian languages.
We conclude that new data gathering processes should be established in partnership with indigenous communities and oriented for educational purposes.
arXiv Detail & Related papers (2024-03-21T16:11:44Z) - MaiBaam: A Multi-Dialectal Bavarian Universal Dependency Treebank [56.810282574817414]
We present the first multi-dialect Bavarian treebank (MaiBaam) manually annotated with part-of-speech and syntactic dependency information in Universal Dependencies (UD)
We highlight the morphosyntactic differences between the closely-related Bavarian and German and showcase the rich variability of speakers' orthographies.
Our corpus includes 15k tokens, covering dialects from all Bavarian-speaking areas spanning three countries.
arXiv Detail & Related papers (2024-03-15T13:33:10Z) - JamPatoisNLI: A Jamaican Patois Natural Language Inference Dataset [7.940548890754674]
JamPatoisNLI provides the first dataset for natural language inference in a creole language, Jamaican Patois.
Many of the most-spoken low-resource languages are creoles.
Our experiments show considerably better results from few-shot learning of JamPatoisNLI than for such unrelated languages.
arXiv Detail & Related papers (2022-12-07T03:07:02Z) - LyS_ACoru\~na at SemEval-2022 Task 10: Repurposing Off-the-Shelf Tools
for Sentiment Analysis as Semantic Dependency Parsing [10.355938901584567]
This paper addresses the problem of structured sentiment analysis using a bi-affine semantic dependency.
For the monolingual setup, we considered: (i) training on a single treebank, and (ii) relaxing the setup by training on treebanks coming from different languages.
For the zero-shot setup and a given target treebank, we relied on: (i) a word-level translation of available treebanks in other languages to get noisy, unlikely-grammatical, but annotated data.
In the post-evaluation phase, we also trained cross-lingual models that simply merged all the English tree
arXiv Detail & Related papers (2022-04-27T10:21:28Z) - Linking Emergent and Natural Languages via Corpus Transfer [98.98724497178247]
We propose a novel way to establish a link by corpus transfer between emergent languages and natural languages.
Our approach showcases non-trivial transfer benefits for two different tasks -- language modeling and image captioning.
We also introduce a novel metric to predict the transferability of an emergent language by translating emergent messages to natural language captions grounded on the same images.
arXiv Detail & Related papers (2022-03-24T21:24:54Z) - Towards Zero-shot Language Modeling [90.80124496312274]
We construct a neural model that is inductively biased towards learning human languages.
We infer this distribution from a sample of typologically diverse training languages.
We harness additional language-specific side information as distant supervision for held-out languages.
arXiv Detail & Related papers (2021-08-06T23:49:18Z) - Apurin\~a Universal Dependencies Treebank [0.4893345190925178]
This paper presents and discusses the first Universal Dependencies treebank for the Apurina language.
The treebank contains 76 fully annotated sentences, applies 14 parts-of-speech, as well as seven augmented or new features.
arXiv Detail & Related papers (2021-06-07T07:42:00Z) - Automatically Identifying Language Family from Acoustic Examples in Low
Resource Scenarios [48.57072884674938]
We propose a method to analyze language similarity using deep learning.
Namely, we train a model on the Wilderness dataset and investigate how its latent space compares with classical language family findings.
arXiv Detail & Related papers (2020-12-01T22:44:42Z) - XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning [68.57658225995966]
Cross-lingual Choice of Plausible Alternatives (XCOPA) is a typologically diverse multilingual dataset for causal commonsense reasoning in 11 languages.
We evaluate a range of state-of-the-art models on this novel dataset, revealing that the performance of current methods falls short compared to translation-based transfer.
arXiv Detail & Related papers (2020-05-01T12:22:33Z) - A Summary of the First Workshop on Language Technology for Language
Documentation and Revitalization [70.14668193220528]
In August 2019, a workshop was held at Carnegie Mellon University to attempt to bring together language community members, documentary linguists, and technologists.
This paper reports the results of the workshop, including issues discussed, and various conceived and implemented technologies for nine languages.
arXiv Detail & Related papers (2020-04-27T22:55:55Z) - Resources for Turkish Dependency Parsing: Introducing the BOUN Treebank
and the BoAT Annotation Tool [0.0]
We introduce the resources that we developed for Turkish dependency parsing, which include a novel manually annotated treebank (BOUN Treebank)
Decisions regarding the annotation of the BOUN Treebank were made in line with the Universal Dependencies (UD) framework.
We report the results of a state-of-the-art dependency annotation obtained over the BOUN Treebank as well as two other treebanks in Turkish.
arXiv Detail & Related papers (2020-02-24T17:59:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.