Related papers: Finding the Needle in a Haystack: Unsupervised Rationale Extraction from Long Text Classifiers

Finding the Needle in a Haystack: Unsupervised Rationale Extraction from Long Text Classifiers

URL: http://arxiv.org/abs/2303.07991v1
Date: Tue, 14 Mar 2023 15:45:35 GMT
Title: Finding the Needle in a Haystack: Unsupervised Rationale Extraction from Long Text Classifiers
Authors: Kamil Bujel, Andrew Caines, Helen Yannakoudakis and Marek Rei
Abstract summary: We propose a compositional soft attention architecture that applies RoBERTa sentence-wise to extract plausible rationales at the token-level. We find this method to significantly outperform Longformer-driven baselines on sentiment classification datasets.
Score: 20.10172411803626
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Long-sequence transformers are designed to improve the representation of longer texts by language models and their performance on downstream document-level tasks. However, not much is understood about the quality of token-level predictions in long-form models. We investigate the performance of such architectures in the context of document classification with unsupervised rationale extraction. We find standard soft attention methods to perform significantly worse when combined with the Longformer language model. We propose a compositional soft attention architecture that applies RoBERTa sentence-wise to extract plausible rationales at the token-level. We find this method to significantly outperform Longformer-driven baselines on sentiment classification datasets, while also exhibiting significantly lower runtimes.

Related papers

Generalizing From Short to Long: Effective Data Synthesis for Long-Context Instruction Tuning [103.65680870130839]
We investigate how to design instruction data for the post-training phase of a long context pre-trained model. Our controlled study reveals that models instruction-tuned on short contexts can effectively generalize to longer ones. Based on these findings, we propose context synthesis, a novel data synthesis framework.
arXiv Detail & Related papers (2025-02-21T17:02:40Z)
Discourse-Driven Evaluation: Unveiling Factual Inconsistency in Long Document Summarization [7.218054628599005]
We study factual inconsistency errors and connect them with a line of discourse analysis. We propose a framework that decomposes long texts into discourse-inspired chunks.
arXiv Detail & Related papers (2025-02-10T06:30:15Z)
How much do contextualized representations encode long-range context? [10.188367784207049]
We analyze contextual representations in neural autoregressive language models, emphasizing long-range contexts that span several thousand tokens. Our methodology employs a perturbation setup and the metric emphAnisotropy-Calibrated Cosine Similarity, to capture the degree of contextualization of long-range patterns from the perspective of representation geometry.
arXiv Detail & Related papers (2024-10-16T06:49:54Z)
A Novel LLM-based Two-stage Summarization Approach for Long Dialogues [9.835499880812646]
This study proposes a hierarchical framework that segments and condenses information from long documents. The condensation stage utilizes an unsupervised generation model to generate condensed data. The summarization stage fine-tunes the abstractive summarization model on the condensed data to generate the final results.
arXiv Detail & Related papers (2024-10-09T03:42:40Z)
Summarizing long regulatory documents with a multi-step pipeline [2.2591852560804675]
We show that the effectiveness of a two-step architecture for summarizing long regulatory texts varies depending on the model used. For abstractive encoder-decoder models with short context lengths, the effectiveness of an extractive step varies, whereas for long-context encoder-decoder models, the extractive step worsens their performance.
arXiv Detail & Related papers (2024-08-19T08:07:25Z)
LOCOST: State-Space Models for Long Document Abstractive Summarization [76.31514220737272]
We propose LOCOST: an encoder-decoder architecture based on state-space models for conditional text generation with long context inputs. With a computational complexity of $O(L log L)$, this architecture can handle significantly longer sequences than state-of-the-art models that are based on sparse attention patterns.
arXiv Detail & Related papers (2024-01-31T15:33:37Z)
HanoiT: Enhancing Context-aware Translation via Selective Context [95.93730812799798]
Context-aware neural machine translation aims to use the document-level context to improve translation quality. The irrelevant or trivial words may bring some noise and distract the model from learning the relationship between the current sentence and the auxiliary context. We propose a novel end-to-end encoder-decoder model with a layer-wise selection mechanism to sift and refine the long document context.
arXiv Detail & Related papers (2023-01-17T12:07:13Z)
Model Criticism for Long-Form Text Generation [113.13900836015122]
We apply a statistical tool, model criticism in latent space, to evaluate the high-level structure of generated text. We perform experiments on three representative aspects of high-level discourse -- coherence, coreference, and topicality. We find that transformer-based language models are able to capture topical structures but have a harder time maintaining structural coherence or modeling coreference.
arXiv Detail & Related papers (2022-10-16T04:35:58Z)
Modeling Context With Linear Attention for Scalable Document-Level Translation [72.41955536834702]
We investigate the efficacy of a recent linear attention model on document translation and augment it with a sentential gate to promote a recency inductive bias. We show that sentential gating further improves translation quality on IWSLT.
arXiv Detail & Related papers (2022-10-16T03:41:50Z)
Adapting Pretrained Text-to-Text Models for Long Text Sequences [39.62224414485055]
We adapt an existing pretrained text-to-text model for long-sequence inputs. We build a long-context model that achieves competitive performance on long-text QA tasks.
arXiv Detail & Related papers (2022-09-21T00:41:07Z)
Long Document Summarization with Top-down and Bottom-up Inference [113.29319668246407]
We propose a principled inference framework to improve summarization models on two aspects. Our framework assumes a hierarchical latent structure of a document where the top-level captures the long range dependency. We demonstrate the effectiveness of the proposed framework on a diverse set of summarization datasets.
arXiv Detail & Related papers (2022-03-15T01:24:51Z)
HETFORMER: Heterogeneous Transformer with Sparse Attention for Long-Text Extractive Summarization [57.798070356553936]
HETFORMER is a Transformer-based pre-trained model with multi-granularity sparse attentions for extractive summarization. Experiments on both single- and multi-document summarization tasks show that HETFORMER achieves state-of-the-art performance in Rouge F1.
arXiv Detail & Related papers (2021-10-12T22:42:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.