Discovering Elementary Discourse Units in Textual Data Using Canonical Correlation Analysis
- URL: http://arxiv.org/abs/2406.12997v2
- Date: Wed, 16 Oct 2024 03:52:13 GMT
- Title: Discovering Elementary Discourse Units in Textual Data Using Canonical Correlation Analysis
- Authors: Akanksha Mehndiratta, Krishna Asawa,
- Abstract summary: This study takes a step further by demonstrating the potential of Canonical Correlation Analysis (CCA) in identifying Elementary Discourse Units(EDUs)
The model is simple, linear, adaptable and language independent making it an ideal baseline particularly when labeled training data is scarce or nonexistent.
- Score: 0.0
- License:
- Abstract: Canonical Correlation Analysis (CCA) has been exploited immensely for learning latent representations in various fields. This study takes a step further by demonstrating the potential of CCA in identifying Elementary Discourse Units(EDUs) that captures the latent information within the textual data. The probabilistic interpretation of CCA discussed in this study utilizes the two-view nature of textual data, i.e. the consecutive sentences in a document or turns in a dyadic conversation, and has a strong theoretical foundation. Furthermore, this study proposes a model for Elementary Discourse Unit(EDU) segmentation that discovers EDUs in textual data without any supervision. To validate the model, the EDUs are utilized as textual unit for content selection in textual similarity task. Empirical results on Semantic Textual Similarity(STSB) and Mohler datasets confirm that, despite represented as a unigram, the EDUs deliver competitive results and can even beat various sophisticated supervised techniques. The model is simple, linear, adaptable and language independent making it an ideal baseline particularly when labeled training data is scarce or nonexistent.
Related papers
- CELA: Cost-Efficient Language Model Alignment for CTR Prediction [71.85120354973073]
Click-Through Rate (CTR) prediction holds a paramount position in recommender systems.
Recent efforts have sought to mitigate these challenges by integrating Pre-trained Language Models (PLMs)
We propose textbfCost-textbfEfficient textbfLanguage Model textbfAlignment (textbfCELA) for CTR prediction.
arXiv Detail & Related papers (2024-05-17T07:43:25Z) - A Hybrid Approach To Aspect Based Sentiment Analysis Using Transfer Learning [3.30307212568497]
We propose a hybrid approach for Aspect Based Sentiment Analysis using transfer learning.
The approach focuses on generating weakly-supervised annotations by exploiting the strengths of both large language models (LLM) and traditional syntactic dependencies.
arXiv Detail & Related papers (2024-03-25T23:02:33Z) - How Well Do Text Embedding Models Understand Syntax? [50.440590035493074]
The ability of text embedding models to generalize across a wide range of syntactic contexts remains under-explored.
Our findings reveal that existing text embedding models have not sufficiently addressed these syntactic understanding challenges.
We propose strategies to augment the generalization ability of text embedding models in diverse syntactic scenarios.
arXiv Detail & Related papers (2023-11-14T08:51:00Z) - Enhancing Argument Structure Extraction with Efficient Leverage of
Contextual Information [79.06082391992545]
We propose an Efficient Context-aware model (ECASE) that fully exploits contextual information.
We introduce a sequence-attention module and distance-weighted similarity loss to aggregate contextual information and argumentative information.
Our experiments on five datasets from various domains demonstrate that our model achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-10-08T08:47:10Z) - Inducing Causal Structure for Abstractive Text Summarization [76.1000380429553]
We introduce a Structural Causal Model (SCM) to induce the underlying causal structure of the summarization data.
We propose a Causality Inspired Sequence-to-Sequence model (CI-Seq2Seq) to learn the causal representations that can mimic the causal factors.
Experimental results on two widely used text summarization datasets demonstrate the advantages of our approach.
arXiv Detail & Related papers (2023-08-24T16:06:36Z) - Exploring the Power of Topic Modeling Techniques in Analyzing Customer
Reviews: A Comparative Analysis [0.0]
Machine learning and natural language processing algorithms have been deployed to analyze the vast amount of textual data available online.
In this study, we examine and compare five frequently used topic modeling methods specifically applied to customer reviews.
Our findings reveal that BERTopic consistently yield more meaningful extracted topics and achieve favorable results.
arXiv Detail & Related papers (2023-08-19T08:18:04Z) - Unimodal and Multimodal Representation Training for Relation Extraction [0.0]
Multimodal integration of text, layout and visual information has achieved SOTA results in visually rich document understanding (VrDU) tasks, including relation extraction (RE)
Here, we demonstrate the value of shared representations for RE tasks by conducting experiments in which each data type is iteratively excluded during training.
While a bimodal text and layout approach performs best, we show that text is the most important single predictor of entity relations.
arXiv Detail & Related papers (2022-11-11T12:39:35Z) - Contextual information integration for stance detection via
cross-attention [59.662413798388485]
Stance detection deals with identifying an author's stance towards a target.
Most existing stance detection models are limited because they do not consider relevant contextual information.
We propose an approach to integrate contextual information as text.
arXiv Detail & Related papers (2022-11-03T15:04:29Z) - Full-Text Argumentation Mining on Scientific Publications [3.8754200816873787]
We introduce a sequential pipeline model combining ADUR and ARE for full-text SAM.
We provide a first analysis of the performance of pretrained language models (PLMs) on both subtasks.
Our detailed error analysis reveals that non-contiguous ADUs as well as the interpretation of discourse connectors pose major challenges.
arXiv Detail & Related papers (2022-10-24T10:05:30Z) - Dependency Induction Through the Lens of Visual Perception [81.91502968815746]
We propose an unsupervised grammar induction model that leverages word concreteness and a structural vision-based to jointly learn constituency-structure and dependency-structure grammars.
Our experiments show that the proposed extension outperforms the current state-of-the-art visually grounded models in constituency parsing even with a smaller grammar size.
arXiv Detail & Related papers (2021-09-20T18:40:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.