Mischievous Nominal Constructions in Universal Dependencies
- URL: http://arxiv.org/abs/2108.12928v1
- Date: Sun, 29 Aug 2021 22:30:15 GMT
- Title: Mischievous Nominal Constructions in Universal Dependencies
- Authors: Nathan Schneider, Amir Zeldes
- Abstract summary: This paper surveys the kinds of mischievous nominal expressions attested in English Universal Dependencies corpora.
It proposes solutions primarily with English in mind, but which may offer paths to solutions for a variety of UD languages.
- Score: 12.767193946205799
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While the highly multilingual Universal Dependencies (UD) project provides
extensive guidelines for clausal structure as well as structure within
canonical nominal phrases, a standard treatment is lacking for many
"mischievous" nominal phenomena that break the mold. As a result, numerous
inconsistencies within and across corpora can be found, even in languages with
extensive UD treebanking work, such as English. This paper surveys the kinds of
mischievous nominal expressions attested in English UD corpora and proposes
solutions primarily with English in mind, but which may offer paths to
solutions for a variety of UD languages.
Related papers
- Understanding and Mitigating Language Confusion in LLMs [76.96033035093204]
We evaluate 15 typologically diverse languages with existing and newly-created English and multilingual prompts.
We find that Llama Instruct and Mistral models exhibit high degrees of language confusion.
We find that language confusion can be partially mitigated via few-shot prompting, multilingual SFT and preference tuning.
arXiv Detail & Related papers (2024-06-28T17:03:51Z) - UCxn: Typologically Informed Annotation of Constructions Atop Universal Dependencies [40.202120178465]
Grammatical constructions that convey meaning through a particular combination of several morphosyntactic elements are not labeled holistically.
We argue for augmenting UD annotations with a 'UCxn' annotation layer for such meaning-bearing grammatical constructions.
As a case study, we consider five construction families in ten languages, identifying instances of each construction in UD treebanks through the use of morphosyntactic patterns.
arXiv Detail & Related papers (2024-03-26T14:40:10Z) - MaiBaam: A Multi-Dialectal Bavarian Universal Dependency Treebank [56.810282574817414]
We present the first multi-dialect Bavarian treebank (MaiBaam) manually annotated with part-of-speech and syntactic dependency information in Universal Dependencies (UD)
We highlight the morphosyntactic differences between the closely-related Bavarian and German and showcase the rich variability of speakers' orthographies.
Our corpus includes 15k tokens, covering dialects from all Bavarian-speaking areas spanning three countries.
arXiv Detail & Related papers (2024-03-15T13:33:10Z) - A Compositional Typed Semantics for Universal Dependencies [26.65442947858347]
We introduce UD Type Calculus, a compositional, principled, and language-independent system of semantic types and logical forms for lexical items.
We explain the essential features of UD Type Calculus, which all involve giving dependency relations denotations just like those of words.
We present results on a large existing corpus of sentences and their logical forms, showing that UD-TC can produce meanings comparable with our baseline.
arXiv Detail & Related papers (2024-03-02T11:58:24Z) - DADA: Dialect Adaptation via Dynamic Aggregation of Linguistic Rules [64.93179829965072]
DADA is a modular approach to imbue SAE-trained models with multi-dialectal robustness.
We show that DADA is effective for both single task and instruction fine language models.
arXiv Detail & Related papers (2023-05-22T18:43:31Z) - Constructing Code-mixed Universal Dependency Forest for Unbiased
Cross-lingual Relation Extraction [92.84968716013783]
Cross-lingual relation extraction (XRE) aggressively leverage the language-consistent structural features from the universal dependency (UD) resource.
We investigate an unbiased UD-based XRE transfer by constructing a type of code-mixed UD forest.
With such forest features, the gaps of UD-based XRE between the training and predicting phases can be effectively closed.
arXiv Detail & Related papers (2023-05-20T18:24:06Z) - CLSE: Corpus of Linguistically Significant Entities [58.29901964387952]
We release a Corpus of Linguistically Significant Entities (CLSE) annotated by experts.
CLSE covers 74 different semantic types to support various applications from airline ticketing to video games.
We create a linguistically representative NLG evaluation benchmark in three languages: French, Marathi, and Russian.
arXiv Detail & Related papers (2022-11-04T12:56:12Z) - Universal and Independent: Multilingual Probing Framework for Exhaustive
Model Interpretation and Evaluation [0.04199844472131922]
We present and apply the GUI-assisted framework allowing us to easily probe a massive number of languages.
Most of the regularities revealed in the mBERT model are typical for the western-European languages.
Our framework can be integrated with the existing probing toolboxes, model cards, and leaderboards.
arXiv Detail & Related papers (2022-10-24T13:41:17Z) - Learning Universal Representations from Word to Sentence [89.82415322763475]
This work introduces and explores the universal representation learning, i.e., embeddings of different levels of linguistic unit in a uniform vector space.
We present our approach of constructing analogy datasets in terms of words, phrases and sentences.
We empirically verify that well pre-trained Transformer models incorporated with appropriate training settings may effectively yield universal representation.
arXiv Detail & Related papers (2020-09-10T03:53:18Z) - Universal Dependencies v2: An Evergrowing Multilingual Treebank
Collection [33.86322085911299]
Universal Dependencies is an open community effort to create cross-linguistically consistent treebank annotation for many languages.
We describe version 2 of the guidelines (UD v2), discuss the major changes from UD v1 to UD v2, and give an overview of the currently available treebanks for 90 languages.
arXiv Detail & Related papers (2020-04-22T15:38:18Z) - Cross-Lingual Adaptation Using Universal Dependencies [1.027974860479791]
We show that models trained using UD parse trees for complex NLP tasks can characterize very different languages.
Based on UD parse trees, we develop several models using tree kernels and show that these models trained on the English dataset can correctly classify data of other languages.
arXiv Detail & Related papers (2020-03-24T13:04:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.