Related papers: Multilingual Syntax-aware Language Modeling through Dependency Tree Conversion

Multilingual Syntax-aware Language Modeling through Dependency Tree Conversion

URL: http://arxiv.org/abs/2204.08644v1
Date: Tue, 19 Apr 2022 03:56:28 GMT
Title: Multilingual Syntax-aware Language Modeling through Dependency Tree Conversion
Authors: Shunsuke Kando, Hiroshi Noji and Yusuke Miyao
Abstract summary: We study the effect on neural language models (LMs) performance across nine conversion methods and five languages. On average, the performance of our best model represents a 19 % increase in accuracy over the worst choice across all languages. Our experiments highlight the importance of choosing the right tree formalism, and provide insights into making an informed decision.
Score: 12.758523394180695
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Incorporating stronger syntactic biases into neural language models (LMs) is a long-standing goal, but research in this area often focuses on modeling English text, where constituent treebanks are readily available. Extending constituent tree-based LMs to the multilingual setting, where dependency treebanks are more common, is possible via dependency-to-constituency conversion methods. However, this raises the question of which tree formats are best for learning the model, and for which languages. We investigate this question by training recurrent neural network grammars (RNNGs) using various conversion methods, and evaluating them empirically in a multilingual setting. We examine the effect on LM performance across nine conversion methods and five languages through seven types of syntactic tests. On average, the performance of our best model represents a 19 \% increase in accuracy over the worst choice across all languages. Our best model shows the advantage over sequential/overparameterized LMs, suggesting the positive effect of syntax injection in a multilingual setting. Our experiments highlight the importance of choosing the right tree formalism, and provide insights into making an informed decision.

Related papers

Training Neural Networks as Recognizers of Formal Languages [87.06906286950438]
Formal language theory pertains specifically to recognizers. It is common to instead use proxy tasks that are similar in only an informal sense. We correct this mismatch by training and evaluating neural networks directly as binary classifiers of strings.
arXiv Detail & Related papers (2024-11-11T16:33:25Z)
Investigating Language-Specific Calibration For Pruning Multilingual Large Language Models [11.421452042888523]
We compare different calibration languages for pruning multilingual models across diverse languages, tasks, models, and SotA pruning techniques. Our results offer practical suggestions, for example, calibrating in the target language can efficiently retain the language modeling capability but does not necessarily benefit downstream tasks.
arXiv Detail & Related papers (2024-08-26T16:29:13Z)
Understanding and Mitigating Language Confusion in LLMs [76.96033035093204]
We evaluate 15 typologically diverse languages with existing and newly-created English and multilingual prompts. We find that Llama Instruct and Mistral models exhibit high degrees of language confusion. We find that language confusion can be partially mitigated via few-shot prompting, multilingual SFT and preference tuning.
arXiv Detail & Related papers (2024-06-28T17:03:51Z)
Beyond Contrastive Learning: A Variational Generative Model for Multilingual Retrieval [109.62363167257664]
We propose a generative model for learning multilingual text embeddings. Our model operates on parallel data in $N$ languages. We evaluate this method on a suite of tasks including semantic similarity, bitext mining, and cross-lingual question retrieval.
arXiv Detail & Related papers (2022-12-21T02:41:40Z)
Improving Multilingual Neural Machine Translation System for Indic Languages [0.0]
We propose a multilingual neural machine translation (MNMT) system to address the issues related to low-resource language translation. A state-of-the-art transformer architecture is used to realize the proposed model. Trials over a good amount of data reveal its superiority over the conventional models.
arXiv Detail & Related papers (2022-09-27T09:51:56Z)
LyS_ACoru\~na at SemEval-2022 Task 10: Repurposing Off-the-Shelf Tools for Sentiment Analysis as Semantic Dependency Parsing [10.355938901584567]
This paper addresses the problem of structured sentiment analysis using a bi-affine semantic dependency. For the monolingual setup, we considered: (i) training on a single treebank, and (ii) relaxing the setup by training on treebanks coming from different languages. For the zero-shot setup and a given target treebank, we relied on: (i) a word-level translation of available treebanks in other languages to get noisy, unlikely-grammatical, but annotated data. In the post-evaluation phase, we also trained cross-lingual models that simply merged all the English tree
arXiv Detail & Related papers (2022-04-27T10:21:28Z)
Incorporating Linguistic Knowledge for Abstractive Multi-document Summarization [20.572283625521784]
We develop a neural network based abstractive multi-document summarization (MDS) model. We process the dependency information into the linguistic-guided attention mechanism. With the help of linguistic signals, sentence-level relations can be correctly captured.
arXiv Detail & Related papers (2021-09-23T08:13:35Z)
Are Multilingual Models Effective in Code-Switching? [57.78477547424949]
We study the effectiveness of multilingual language models to understand their capability and adaptability to the mixed-language setting. Our findings suggest that pre-trained multilingual models do not necessarily guarantee high-quality representations on code-switching.
arXiv Detail & Related papers (2021-03-24T16:20:02Z)
XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning [68.57658225995966]
Cross-lingual Choice of Plausible Alternatives (XCOPA) is a typologically diverse multilingual dataset for causal commonsense reasoning in 11 languages. We evaluate a range of state-of-the-art models on this novel dataset, revealing that the performance of current methods falls short compared to translation-based transfer.
arXiv Detail & Related papers (2020-05-01T12:22:33Z)
Information-Theoretic Probing for Linguistic Structure [74.04862204427944]
We propose an information-theoretic operationalization of probing as estimating mutual information. We evaluate on a set of ten typologically diverse languages often underrepresented in NLP research.
arXiv Detail & Related papers (2020-04-07T01:06:36Z)
Cross-Lingual Adaptation Using Universal Dependencies [1.027974860479791]
We show that models trained using UD parse trees for complex NLP tasks can characterize very different languages. Based on UD parse trees, we develop several models using tree kernels and show that these models trained on the English dataset can correctly classify data of other languages.
arXiv Detail & Related papers (2020-03-24T13:04:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.