DMRST: A Joint Framework for Document-Level Multilingual RST Discourse
Segmentation and Parsing
- URL: http://arxiv.org/abs/2110.04518v1
- Date: Sat, 9 Oct 2021 09:15:56 GMT
- Title: DMRST: A Joint Framework for Document-Level Multilingual RST Discourse
Segmentation and Parsing
- Authors: Zhengyuan Liu, Ke Shi, Nancy F. Chen
- Abstract summary: We propose a document-level multilingual RST discourse parsing framework, which conducts EDU segmentation and discourse tree parsing jointly.
Our model achieves state-of-the-art performance on document-level multilingual RST parsing in all sub-tasks.
- Score: 24.986030179701405
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text discourse parsing weighs importantly in understanding information flow
and argumentative structure in natural language, making it beneficial for
downstream tasks. While previous work significantly improves the performance of
RST discourse parsing, they are not readily applicable to practical use cases:
(1) EDU segmentation is not integrated into most existing tree parsing
frameworks, thus it is not straightforward to apply such models on newly-coming
data. (2) Most parsers cannot be used in multilingual scenarios, because they
are developed only in English. (3) Parsers trained from single-domain treebanks
do not generalize well on out-of-domain inputs. In this work, we propose a
document-level multilingual RST discourse parsing framework, which conducts EDU
segmentation and discourse tree parsing jointly. Moreover, we propose a
cross-translation augmentation strategy to enable the framework to support
multilingual parsing and improve its domain generality. Experimental results
show that our model achieves state-of-the-art performance on document-level
multilingual RST parsing in all sub-tasks.
Related papers
- A General and Flexible Multi-concept Parsing Framework for Multilingual Semantic Matching [60.51839859852572]
We propose to resolve the text into multi concepts for multilingual semantic matching to liberate the model from the reliance on NER models.
We conduct comprehensive experiments on English datasets QQP and MRPC, and Chinese dataset Medical-SM.
arXiv Detail & Related papers (2024-03-05T13:55:16Z) - Cross-domain Chinese Sentence Pattern Parsing [67.1381983012038]
Sentence Pattern Structure (SPS) parsing is a syntactic analysis method primarily employed in language teaching.
Existing SPSs rely heavily on textbook corpora for training, lacking cross-domain capability.
This paper proposes an innovative approach leveraging large language models (LLMs) within a self-training framework.
arXiv Detail & Related papers (2024-02-26T05:30:48Z) - RST-style Discourse Parsing Guided by Document-level Content Structures [27.28989421841165]
Existing RST parsing pipelines construct rhetorical structures without the knowledge of document-level content structures.
We propose a novel pipeline for RST-DP that incorporates structure-aware news content sentence representations.
arXiv Detail & Related papers (2023-09-08T05:50:27Z) - Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally.
Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z) - Cascading and Direct Approaches to Unsupervised Constituency Parsing on
Spoken Sentences [67.37544997614646]
We present the first study on unsupervised spoken constituency parsing.
The goal is to determine the spoken sentences' hierarchical syntactic structure in the form of constituency parse trees.
We show that accurate segmentation alone may be sufficient to parse spoken sentences accurately.
arXiv Detail & Related papers (2023-03-15T17:57:22Z) - Advancing Multilingual Pre-training: TRIP Triangular Document-level
Pre-training for Multilingual Language Models [107.83158521848372]
We present textbfTriangular Document-level textbfPre-training (textbfTRIP), which is the first in the field to accelerate the conventional monolingual and bilingual objectives into a trilingual objective with a novel method called Grafting.
TRIP achieves several strong state-of-the-art (SOTA) scores on three multilingual document-level machine translation benchmarks and one cross-lingual abstractive summarization benchmark, including consistent improvements by up to 3.11 d-BLEU points and 8.9 ROUGE-L points.
arXiv Detail & Related papers (2022-12-15T12:14:25Z) - A Simple and Strong Baseline for End-to-End Neural RST-style Discourse
Parsing [44.72809363746258]
This paper explores a strong baseline by integrating existing simple parsing strategies, top-down and bottom-up, with various transformer-based pre-trained language models.
The experimental results obtained from two benchmark datasets demonstrate that the parsing performance relies on the pretrained language models rather than the parsing strategies.
arXiv Detail & Related papers (2022-10-15T18:38:08Z) - LiLT: A Simple yet Effective Language-Independent Layout Transformer for
Structured Document Understanding [33.78249073009646]
We propose a simple yet effective Language-independent Layout Transformer (LiLT) for structured document understanding.
LiLT can be pre-trained on the structured documents of a single language and then directly fine-tuned on other languages.
Experimental results on eight languages have shown that LiLT can achieve competitive or even superior performance on diverse widely-used downstream benchmarks.
arXiv Detail & Related papers (2022-02-28T10:33:01Z) - X2Parser: Cross-Lingual and Cross-Domain Framework for Task-Oriented
Compositional Semantic Parsing [51.81533991497547]
Task-oriented compositional semantic parsing (TCSP) handles complex nested user queries.
We present X2 compared a transferable Cross-lingual and Cross-domain for TCSP.
We propose to predict flattened intents and slots representations separately and cast both prediction tasks into sequence labeling problems.
arXiv Detail & Related papers (2021-06-07T16:40:05Z) - RST Parsing from Scratch [14.548146390081778]
We introduce a novel end-to-end formulation of document-level discourse parsing in the Rhetorical Structure Theory (RST) framework.
Our framework facilitates discourse parsing from scratch without requiring discourse segmentation as a prerequisite.
Our unified parsing model adopts a beam search to decode the best tree structure by searching through a space of high-scoring trees.
arXiv Detail & Related papers (2021-05-23T06:19:38Z) - Multilingual Neural RST Discourse Parsing [24.986030179701405]
We investigate two approaches to establish a neural, cross-lingual discourse via multilingual vector representations and segment-level translation.
Experiment results show that both methods are effective even with limited training data, and achieve state-of-the-art performance on cross-lingual, document-level discourse parsing.
arXiv Detail & Related papers (2020-12-03T05:03:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.