Persian Rhetorical Structure Theory
- URL: http://arxiv.org/abs/2106.13833v1
- Date: Fri, 25 Jun 2021 18:15:47 GMT
- Title: Persian Rhetorical Structure Theory
- Authors: Sara Shahmohammadi, Hadi Veisi, Ali Darzi
- Abstract summary: We present a discourse-annotated corpus for the Persian language built in the framework of Rhetorical Theory.
Our corpus consists of 150 journalistic texts, each text having an average of around 400 words.
Our text-level discourse is trained using gold segmentation and is built upon the DPLP discoursebank.
- Score: 2.610470075814367
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Over the past years, interest in discourse analysis and discourse parsing has
steadily grown, and many discourse-annotated corpora and, as a result,
discourse parsers have been built. In this paper, we present a
discourse-annotated corpus for the Persian language built in the framework of
Rhetorical Structure Theory as well as a discourse parser built upon the DPLP
parser, an open-source discourse parser. Our corpus consists of 150
journalistic texts, each text having an average of around 400 words. Corpus
texts were annotated using 18 discourse relations and based on the annotation
guideline of the English RST Discourse Treebank corpus. Our text-level
discourse parser is trained using gold segmentation and is built upon the DPLP
discourse parser, which uses a large-margin transition-based approach to solve
the problem of discourse parsing. The performance of our discourse parser in
span (S), nuclearity (N) and relation (R) detection is around 78%, 64%, 44%
respectively, in terms of F1 measure.
Related papers
- Llamipa: An Incremental Discourse Parser [6.9534924995446055]
This paper provides the first discourse parsing experiments with a large language model finetuned on corpora in the style of SDRT.
It can process discourse data, which is essential for the eventual use of discourse information in downstream tasks.
arXiv Detail & Related papers (2024-06-26T11:08:17Z) - Growing Trees on Sounds: Assessing Strategies for End-to-End Dependency Parsing of Speech [8.550564152063522]
We report on a set of experiments aiming at assessing the performance of two parsing paradigms on speech parsing.
We perform this evaluation on a large treebank of spoken French, featuring realistic spontaneous conversations.
Our findings show that (i) the graph based approach obtains better results across the board (ii) parsing directly from speech outperforms a pipeline approach, despite having 30% fewer parameters.
arXiv Detail & Related papers (2024-06-18T13:46:10Z) - SpeechAlign: a Framework for Speech Translation Alignment Evaluation [15.069228503777124]
SpeechAlign is a framework designed to evaluate the underexplored field of source-target alignment in speech models.
To tackle the absence of suitable evaluation datasets, we introduce the Speech Gold Alignment dataset.
We also introduce two novel metrics, Speech Alignment Error Rate (SAER) and Time-weighted Speech Alignment Error Rate (TW-SAER)
arXiv Detail & Related papers (2023-09-20T18:46:37Z) - Revisiting Conversation Discourse for Dialogue Disentanglement [88.3386821205896]
We propose enhancing dialogue disentanglement by taking full advantage of the dialogue discourse characteristics.
We develop a structure-aware framework to integrate the rich structural features for better modeling the conversational semantic context.
Our work has great potential to facilitate broader multi-party multi-thread dialogue applications.
arXiv Detail & Related papers (2023-06-06T19:17:47Z) - Cascading and Direct Approaches to Unsupervised Constituency Parsing on
Spoken Sentences [67.37544997614646]
We present the first study on unsupervised spoken constituency parsing.
The goal is to determine the spoken sentences' hierarchical syntactic structure in the form of constituency parse trees.
We show that accurate segmentation alone may be sufficient to parse spoken sentences accurately.
arXiv Detail & Related papers (2023-03-15T17:57:22Z) - BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric [66.73705349465207]
End-to-end speech-to-speech translation (S2ST) is generally evaluated with text-based metrics.
We propose a text-free evaluation metric for end-to-end S2ST, named BLASER, to avoid the dependency on ASR systems.
arXiv Detail & Related papers (2022-12-16T14:00:26Z) - RuArg-2022: Argument Mining Evaluation [69.87149207721035]
This paper is a report of the organizers on the first competition of argumentation analysis systems dealing with Russian language texts.
A corpus containing 9,550 sentences (comments on social media posts) on three topics related to the COVID-19 pandemic was prepared.
The system that won the first place in both tasks used the NLI (Natural Language Inference) variant of the BERT architecture.
arXiv Detail & Related papers (2022-06-18T17:13:37Z) - Discourse Analysis for Evaluating Coherence in Video Paragraph Captions [99.37090317971312]
We are exploring a novel discourse based framework to evaluate the coherence of video paragraphs.
Central to our approach is the discourse representation of videos, which helps in modeling coherence of paragraphs conditioned on coherence of videos.
Our experiment results have shown that the proposed framework evaluates coherence of video paragraphs significantly better than all the baseline methods.
arXiv Detail & Related papers (2022-01-17T04:23:08Z) - Penn-Helsinki Parsed Corpus of Early Modern English: First Parsing
Results and Analysis [2.8749014299466444]
We present the first parsing results on the Penn-Helsinki Parsed Corpus of Early Modern English (PPCEME), a 1.9 million word treebank.
We describe key features of PPCEME that make it challenging for parsing, including a larger and more varied set of function tags than in the Penn Treebank.
arXiv Detail & Related papers (2021-12-15T23:56:21Z) - A Novel Corpus of Discourse Structure in Humans and Computers [55.74664144248097]
We present a novel corpus of 445 human- and computer-generated documents, comprising about 27,000 clauses.
The corpus covers both formal and informal discourse, and contains documents generated using fine-tuned GPT-2.
arXiv Detail & Related papers (2021-11-10T20:56:08Z) - FT Speech: Danish Parliament Speech Corpus [21.190182627955817]
This paper introduces FT Speech, a new speech corpus created from the recorded meetings of the Danish Parliament.
The corpus contains over 1,800 hours of transcribed speech by a total of 434 speakers.
It is significantly larger in duration, vocabulary, and amount of spontaneous speech than the existing public speech corpora for Danish.
arXiv Detail & Related papers (2020-05-25T19:51:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.