TGIF: Tree-Graph Integrated-Format Parser for Enhanced UD with Two-Stage
Generic- to Individual-Language Finetuning
- URL: http://arxiv.org/abs/2107.06907v1
- Date: Wed, 14 Jul 2021 18:00:08 GMT
- Title: TGIF: Tree-Graph Integrated-Format Parser for Enhanced UD with Two-Stage
Generic- to Individual-Language Finetuning
- Authors: Tianze Shi, Lillian Lee
- Abstract summary: We present our contribution to the IWPT 2021 shared task on parsing into enhanced Universal Dependencies.
Our main system component is a hybrid tree-graph that integrates predictions of spanning trees for the enhanced graphs with additional graph edges not present in the spanning trees.
- Score: 18.71574180551552
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present our contribution to the IWPT 2021 shared task on parsing into
enhanced Universal Dependencies. Our main system component is a hybrid
tree-graph parser that integrates (a) predictions of spanning trees for the
enhanced graphs with (b) additional graph edges not present in the spanning
trees. We also adopt a finetuning strategy where we first train a
language-generic parser on the concatenation of data from all available
languages, and then, in a second step, finetune on each individual language
separately. Additionally, we develop our own complete set of pre-processing
modules relevant to the shared task, including tokenization, sentence
segmentation, and multiword token expansion, based on pre-trained XLM-R models
and our own pre-training of character-level language models. Our submission
reaches a macro-average ELAS of 89.24 on the test set. It ranks top among all
teams, with a margin of more than 2 absolute ELAS over the next best-performing
submission, and best score on 16 out of 17 languages.
Related papers
- CompoundPiece: Evaluating and Improving Decompounding Performance of
Language Models [77.45934004406283]
We systematically study decompounding, the task of splitting compound words into their constituents.
We introduce a dataset of 255k compound and non-compound words across 56 diverse languages obtained from Wiktionary.
We introduce a novel methodology to train dedicated models for decompounding.
arXiv Detail & Related papers (2023-05-23T16:32:27Z) - EAG: Extract and Generate Multi-way Aligned Corpus for Complete Multi-lingual Neural Machine Translation [63.88541605363555]
"Extract and Generate" (EAG) is a two-step approach to construct large-scale and high-quality multi-way aligned corpus from bilingual data.
We first extract candidate aligned examples by pairing the bilingual examples from different language pairs with highly similar source or target sentences.
We then generate the final aligned examples from the candidates with a well-trained generation model.
arXiv Detail & Related papers (2022-03-04T08:21:27Z) - The DCU-EPFL Enhanced Dependency Parser at the IWPT 2021 Shared Task [19.98425994656106]
We describe the multitask-EPFL submission to the IWPT 2021 Shared Task on Parsing into Enhanced Universal Dependencies.
The task involves parsing Enhanced graphs, which are an extension of the basic dependency trees designed to be more facilitative towards representing semantic structure.
evaluation is carried out on 29 treebanks in 17 languages and participants are required to parse the data from each language starting from raw strings.
arXiv Detail & Related papers (2021-07-05T12:42:59Z) - Constructing Taxonomies from Pretrained Language Models [52.53846972667636]
We present a method for constructing taxonomic trees (e.g., WordNet) using pretrained language models.
Our approach is composed of two modules, one that predicts parenthood relations and another that reconciles those predictions into trees.
We train our model on subtrees sampled from WordNet, and test on non-overlapping WordNet subtrees.
arXiv Detail & Related papers (2020-10-24T07:16:21Z) - Automatic Extraction of Rules Governing Morphological Agreement [103.78033184221373]
We develop an automated framework for extracting a first-pass grammatical specification from raw text.
We focus on extracting rules describing agreement, a morphosyntactic phenomenon at the core of the grammars of many of the world's languages.
We apply our framework to all languages included in the Universal Dependencies project, with promising results.
arXiv Detail & Related papers (2020-10-02T18:31:45Z) - Span-based Semantic Parsing for Compositional Generalization [53.24255235340056]
SpanBasedSP predicts a span tree over an input utterance, explicitly encoding how partial programs compose over spans in the input.
On GeoQuery, SCAN and CLOSURE, SpanBasedSP performs similarly to strong seq2seq baselines on random splits, but dramatically improves performance compared to baselines on splits that require compositional generalization.
arXiv Detail & Related papers (2020-09-13T16:42:18Z) - The ADAPT Enhanced Dependency Parser at the IWPT 2020 Shared Task [12.226699055857182]
We describe the ADAPT system for the 2020 IWPT Shared Task on parsing enhanced Universal Dependencies in 17 languages.
We implement a pipeline approach using UDPipe and UDPipe-future to provide initial levels of annotation.
For the majority of languages, a semantic dependency can be successfully applied to the task of parsing enhanced dependencies.
arXiv Detail & Related papers (2020-09-03T14:43:04Z) - K{\o}psala: Transition-Based Graph Parsing via Efficient Training and
Effective Encoding [13.490365811869719]
We present Kopsala, the Copenhagen-Uppsala system for the Enhanced Universal Dependencies Shared Task at IWPT 2020.
Our system is a pipeline consisting of off-the-shelf models for everything but enhanced parsing, and for the latter, a transition-based graphencies adapted from Che et al.
Our demonstrates that a unified pipeline is effective for both Representation Parsing and Enhanced Universal Dependencies, according to average ELAS.
arXiv Detail & Related papers (2020-05-25T13:17:09Z) - Towards Instance-Level Parser Selection for Cross-Lingual Transfer of
Dependency Parsers [59.345145623931636]
We argue for a novel cross-lingual transfer paradigm: instance-level selection (ILPS)
We present a proof-of-concept study focused on instance-level selection in the framework of delexicalized transfer.
arXiv Detail & Related papers (2020-04-16T13:18:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.