Multilingual Neural RST Discourse Parsing
- URL: http://arxiv.org/abs/2012.01704v1
- Date: Thu, 3 Dec 2020 05:03:38 GMT
- Title: Multilingual Neural RST Discourse Parsing
- Authors: Zhengyuan Liu, Ke Shi, Nancy F. Chen
- Abstract summary: We investigate two approaches to establish a neural, cross-lingual discourse via multilingual vector representations and segment-level translation.
Experiment results show that both methods are effective even with limited training data, and achieve state-of-the-art performance on cross-lingual, document-level discourse parsing.
- Score: 24.986030179701405
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text discourse parsing plays an important role in understanding information
flow and argumentative structure in natural language. Previous research under
the Rhetorical Structure Theory (RST) has mostly focused on inducing and
evaluating models from the English treebank. However, the parsing tasks for
other languages such as German, Dutch, and Portuguese are still challenging due
to the shortage of annotated data. In this work, we investigate two approaches
to establish a neural, cross-lingual discourse parser via: (1) utilizing
multilingual vector representations; and (2) adopting segment-level translation
of the source content. Experiment results show that both methods are effective
even with limited training data, and achieve state-of-the-art performance on
cross-lingual, document-level discourse parsing on all sub-tasks.
Related papers
- Bilingual Rhetorical Structure Parsing with Large Parallel Annotations [5.439020425819001]
We introduce a parallel Russian annotation for the large and diverse English GUM RST corpus.
Our end-to-end RST achieves state-of-the-art results on both English and Russian corpora.
To the best of our knowledge, this work is the first to evaluate the potential of cross-lingual end-to-end RST parsing on a manually annotated parallel corpus.
arXiv Detail & Related papers (2024-09-23T12:40:33Z) - Towards a Deep Understanding of Multilingual End-to-End Speech
Translation [52.26739715012842]
We analyze representations learnt in a multilingual end-to-end speech translation model trained over 22 languages.
We derive three major findings from our analysis.
arXiv Detail & Related papers (2023-10-31T13:50:55Z) - Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language
Modelling [70.23876429382969]
We propose a benchmark that can evaluate intra-sentence discourse properties across a diverse set of NLP tasks.
Disco-Bench consists of 9 document-level testsets in the literature domain, which contain rich discourse phenomena.
For linguistic analysis, we also design a diagnostic test suite that can examine whether the target models learn discourse knowledge.
arXiv Detail & Related papers (2023-07-16T15:18:25Z) - DMRST: A Joint Framework for Document-Level Multilingual RST Discourse
Segmentation and Parsing [24.986030179701405]
We propose a document-level multilingual RST discourse parsing framework, which conducts EDU segmentation and discourse tree parsing jointly.
Our model achieves state-of-the-art performance on document-level multilingual RST parsing in all sub-tasks.
arXiv Detail & Related papers (2021-10-09T09:15:56Z) - Leveraging Pre-trained Language Model for Speech Sentiment Analysis [58.78839114092951]
We explore the use of pre-trained language models to learn sentiment information of written texts for speech sentiment analysis.
We propose a pseudo label-based semi-supervised training strategy using a language model on an end-to-end speech sentiment approach.
arXiv Detail & Related papers (2021-06-11T20:15:21Z) - AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages
with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context.
It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts.
Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z) - ERICA: Improving Entity and Relation Understanding for Pre-trained
Language Models via Contrastive Learning [97.10875695679499]
We propose a novel contrastive learning framework named ERICA in pre-training phase to obtain a deeper understanding of the entities and their relations in text.
Experimental results demonstrate that our proposed ERICA framework achieves consistent improvements on several document-level language understanding tasks.
arXiv Detail & Related papers (2020-12-30T03:35:22Z) - A Deep Reinforced Model for Zero-Shot Cross-Lingual Summarization with
Bilingual Semantic Similarity Rewards [40.17497211507507]
Cross-lingual text summarization is a practically important but under-explored task.
We propose an end-to-end cross-lingual text summarization model.
arXiv Detail & Related papers (2020-06-27T21:51:38Z) - On the Language Neutrality of Pre-trained Multilingual Representations [70.93503607755055]
We investigate the language-neutrality of multilingual contextual embeddings directly and with respect to lexical semantics.
Our results show that contextual embeddings are more language-neutral and, in general, more informative than aligned static word-type embeddings.
We show how to reach state-of-the-art accuracy on language identification and match the performance of statistical methods for word alignment of parallel sentences.
arXiv Detail & Related papers (2020-04-09T19:50:32Z) - Investigating Language Impact in Bilingual Approaches for Computational
Language Documentation [28.838960956506018]
This paper investigates how the choice of translation language affects the posterior documentation work.
We create 56 bilingual pairs that we apply to the task of low-resource unsupervised word segmentation and alignment.
Our results suggest that incorporating clues into the neural models' input representation increases their translation and alignment quality.
arXiv Detail & Related papers (2020-03-30T10:30:34Z) - A Hybrid Approach to Dependency Parsing: Combining Rules and Morphology
with Deep Learning [0.0]
We propose two approaches to dependency parsing especially for languages with restricted amount of training data.
Our first approach combines a state-of-the-art deep learning-based with a rule-based approach and the second one incorporates morphological information into the network.
The proposed methods are developed for Turkish, but can be adapted to other languages as well.
arXiv Detail & Related papers (2020-02-24T08:34:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.