Related papers: Universal Dependencies v2: An Evergrowing Multilingual Treebank Collection

Universal Dependencies v2: An Evergrowing Multilingual Treebank Collection

URL: http://arxiv.org/abs/2004.10643v1
Date: Wed, 22 Apr 2020 15:38:18 GMT
Title: Universal Dependencies v2: An Evergrowing Multilingual Treebank Collection
Authors: Joakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Jan Haji\v{c}, Christopher D. Manning, Sampo Pyysalo, Sebastian Schuster, Francis Tyers, Daniel Zeman
Abstract summary: Universal Dependencies is an open community effort to create cross-linguistically consistent treebank annotation for many languages. We describe version 2 of the guidelines (UD v2), discuss the major changes from UD v1 to UD v2, and give an overview of the currently available treebanks for 90 languages.
Score: 33.86322085911299
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Universal Dependencies is an open community effort to create cross-linguistically consistent treebank annotation for many languages within a dependency-based lexicalist framework. The annotation consists in a linguistically motivated word segmentation; a morphological layer comprising lemmas, universal part-of-speech tags, and standardized morphological features; and a syntactic layer focusing on syntactic relations between predicates, arguments and modifiers. In this paper, we describe version 2 of the guidelines (UD v2), discuss the major changes from UD v1 to UD v2, and give an overview of the currently available treebanks for 90 languages.

Related papers

Counting trees: A treebank-driven exploration of syntactic variation in speech and writing across languages [0.0]
We define syntactic structures as delexicalized dependency (sub)trees and extract them from spoken and written Universal Dependencies treebanks.<n>For each corpus, we analyze the size, diversity, and distribution of syntactic inventories, their overlap across modalities, and the structures most characteristic of speech.<n>Results show that, across both languages, spoken corpora contain fewer and less diverse syntactic structures than their written counterparts.
arXiv Detail & Related papers (2025-05-28T18:43:26Z)
UCxn: Typologically Informed Annotation of Constructions Atop Universal Dependencies [40.202120178465]
Grammatical constructions that convey meaning through a particular combination of several morphosyntactic elements are not labeled holistically. We argue for augmenting UD annotations with a 'UCxn' annotation layer for such meaning-bearing grammatical constructions. As a case study, we consider five construction families in ten languages, identifying instances of each construction in UD treebanks through the use of morphosyntactic patterns.
arXiv Detail & Related papers (2024-03-26T14:40:10Z)
Wav2Gloss: Generating Interlinear Glossed Text from Speech [78.64412090339044]
We propose Wav2Gloss, a task in which four linguistic annotation components are extracted automatically from speech. We provide various baselines to lay the groundwork for future research on Interlinear Glossed Text generation from speech.
arXiv Detail & Related papers (2024-03-19T21:45:29Z)
A Compositional Typed Semantics for Universal Dependencies [26.65442947858347]
We introduce UD Type Calculus, a compositional, principled, and language-independent system of semantic types and logical forms for lexical items. We explain the essential features of UD Type Calculus, which all involve giving dependency relations denotations just like those of words. We present results on a large existing corpus of sentences and their logical forms, showing that UD-TC can produce meanings comparable with our baseline.
arXiv Detail & Related papers (2024-03-02T11:58:24Z)
UniMorph 4.0: Universal Morphology [104.69846084893298]
This paper presents the expansions and improvements made on several fronts over the last couple of years. Collaborative efforts by numerous linguists have added 67 new languages, including 30 endangered languages. In light of the last UniMorph release, we also augmented the database with morpheme segmentation for 16 languages.
arXiv Detail & Related papers (2022-05-07T09:19:02Z)
Cross-linguistically Consistent Semantic and Syntactic Annotation of Child-directed Speech [27.657676278734534]
This paper proposes a methodology for constructing such corpora of child directed speech paired with sentential logical forms. The approach enforces a cross-linguistically consistent representation, building on recent advances in dependency representation and semantic parsing.
arXiv Detail & Related papers (2021-09-22T18:17:06Z)
VECO: Variable and Flexible Cross-lingual Pre-training for Language Understanding and Generation [77.82373082024934]
We plug a cross-attention module into the Transformer encoder to explicitly build the interdependence between languages. It can effectively avoid the degeneration of predicting masked words only conditioned on the context in its own language. The proposed cross-lingual model delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark.
arXiv Detail & Related papers (2020-10-30T03:41:38Z)
Constructing a Family Tree of Ten Indo-European Languages with Delexicalized Cross-linguistic Transfer Patterns [57.86480614673034]
We formalize the delexicalized transfer as interpretable tree-to-string and tree-to-tree patterns. This allows us to quantitatively probe cross-linguistic transfer and extend inquiries of Second Language Acquisition.
arXiv Detail & Related papers (2020-07-17T15:56:54Z)
Finding Universal Grammatical Relations in Multilingual BERT [47.74015366712623]
We show that subspaces of mBERT representations recover syntactic tree distances in languages other than English. We present an unsupervised analysis method that provides evidence mBERT learns representations of syntactic dependency labels.
arXiv Detail & Related papers (2020-05-09T20:46:02Z)
Do Neural Language Models Show Preferences for Syntactic Formalisms? [14.388237635684737]
We study the extent to which the semblance of syntactic structure captured by language models adheres to a surface-syntactic or deep syntactic style of analysis. We apply a probe for extracting directed dependency trees to BERT and ELMo models trained on 13 different languages. We find that both models exhibit a preference for UD over SUD - with interesting variations across languages and layers.
arXiv Detail & Related papers (2020-04-29T11:37:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.