A Novel Dependency Framework for Enhancing Discourse Data Analysis
- URL: http://arxiv.org/abs/2407.12473v1
- Date: Wed, 17 Jul 2024 10:55:00 GMT
- Title: A Novel Dependency Framework for Enhancing Discourse Data Analysis
- Authors: Kun Sun, Rong Wang,
- Abstract summary: This study has as its primary focus the conversion of PDTB annotations into dependency structures.
It employs refined BERT-based discourses to test the validity of the dependency data derived from the PDTB-style corpora in English, Chinese, and several other languages.
The results show that the PDTB dependency data is valid and that there is a strong correlation between the two types of dependency distance.
- Score: 27.152245569974678
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The development of different theories of discourse structure has led to the establishment of discourse corpora based on these theories. However, the existence of discourse corpora established on different theoretical bases creates challenges when it comes to exploring them in a consistent and cohesive way. This study has as its primary focus the conversion of PDTB annotations into dependency structures. It employs refined BERT-based discourse parsers to test the validity of the dependency data derived from the PDTB-style corpora in English, Chinese, and several other languages. By converting both PDTB and RST annotations for the same texts into dependencies, this study also applies ``dependency distance'' metrics to examine the correlation between RST dependencies and PDTB dependencies in English. The results show that the PDTB dependency data is valid and that there is a strong correlation between the two types of dependency distance. This study presents a comprehensive approach for analyzing and evaluating discourse corpora by employing discourse dependencies to achieve unified analysis. By applying dependency representations, we can extract data from PDTB, RST, and SDRT corpora in a coherent and unified manner. Moreover, the cross-linguistic validation establishes the framework's generalizability beyond English. The establishment of this comprehensive dependency framework overcomes limitations of existing discourse corpora, supporting a diverse range of algorithms and facilitating further studies in computational discourse analysis and language sciences.
Related papers
- Automatic Alignment of Discourse Relations of Different Discourse Annotation Frameworks [5.439020425819001]
We introduce a fully automatic approach to learn label embeddings during a classification task.
These embeddings are then utilized to map discourse relations from different frameworks.
arXiv Detail & Related papers (2024-03-29T14:18:26Z) - eRST: A Signaled Graph Theory of Discourse Relations and Organization [14.074017875514787]
We present a new theoretical framework for computational discourse analysis, based on an expansion of Rhetorical Structure Theory (RST)
The framework encompasses discourse relation graphs with tree-breaking, non-projective and concurrent relations, as well as implicit and explicit signals which give explainable rationales to our analyses.
We present and evaluate a freely available corpus of English annotated according to our framework, encompassing 12 spoken and written genres with over 200K tokens.
arXiv Detail & Related papers (2024-03-20T12:52:38Z) - Cross-domain Chinese Sentence Pattern Parsing [67.1381983012038]
Sentence Pattern Structure (SPS) parsing is a syntactic analysis method primarily employed in language teaching.
Existing SPSs rely heavily on textbook corpora for training, lacking cross-domain capability.
This paper proposes an innovative approach leveraging large language models (LLMs) within a self-training framework.
arXiv Detail & Related papers (2024-02-26T05:30:48Z) - How Well Do Text Embedding Models Understand Syntax? [50.440590035493074]
The ability of text embedding models to generalize across a wide range of syntactic contexts remains under-explored.
Our findings reveal that existing text embedding models have not sufficiently addressed these syntactic understanding challenges.
We propose strategies to augment the generalization ability of text embedding models in diverse syntactic scenarios.
arXiv Detail & Related papers (2023-11-14T08:51:00Z) - A Pilot Study on Dialogue-Level Dependency Parsing for Chinese [21.698966896156087]
We develop a high-quality human-annotated corpus, which contains 850 dialogues and 199,803 dependencies.
Considering that such tasks suffer from high annotation costs, we investigate zero-shot and few-shot scenarios.
Based on an existing syntactic treebank, we adopt a signal-based method to transform seen syntactic dependencies into unseen ones.
arXiv Detail & Related papers (2023-05-21T12:20:13Z) - Learning Relation Alignment for Calibrated Cross-modal Retrieval [52.760541762871505]
We propose a novel metric, Intra-modal Self-attention Distance (ISD), to quantify the relation consistency by measuring the semantic distance between linguistic and visual relations.
We present Inter-modal Alignment on Intra-modal Self-attentions (IAIS), a regularized training method to optimize the ISD and calibrate intra-modal self-attentions mutually via inter-modal alignment.
arXiv Detail & Related papers (2021-05-28T14:25:49Z) - Linguistic dependencies and statistical dependence [76.89273585568084]
We use pretrained language models to estimate probabilities of words in context.
We find that maximum-CPMI trees correspond to linguistic dependencies more often than trees extracted from non-contextual PMI estimate.
arXiv Detail & Related papers (2021-04-18T02:43:37Z) - Unifying Discourse Resources with Dependency Framework [18.498060350460463]
We unify Chinese discourse corpora under different annotation schemes with discourse dependency framework.
We implement several benchmark dependencys and research on how they can leverage the unified data to improve performance.
arXiv Detail & Related papers (2021-01-01T05:23:29Z) - GATE: Graph Attention Transformer Encoder for Cross-lingual Relation and
Event Extraction [107.8262586956778]
We introduce graph convolutional networks (GCNs) with universal dependency parses to learn language-agnostic sentence representations.
GCNs struggle to model words with long-range dependencies or are not directly connected in the dependency tree.
We propose to utilize the self-attention mechanism to learn the dependencies between words with different syntactic distances.
arXiv Detail & Related papers (2020-10-06T20:30:35Z) - A Dependency Syntactic Knowledge Augmented Interactive Architecture for
End-to-End Aspect-based Sentiment Analysis [73.74885246830611]
We propose a novel dependency syntactic knowledge augmented interactive architecture with multi-task learning for end-to-end ABSA.
This model is capable of fully exploiting the syntactic knowledge (dependency relations and types) by leveraging a well-designed Dependency Relation Embedded Graph Convolutional Network (DreGcn)
Extensive experimental results on three benchmark datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-04T14:59:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.