Finding the Needle in a Haystack: Unsupervised Rationale Extraction from
Long Text Classifiers
- URL: http://arxiv.org/abs/2303.07991v1
- Date: Tue, 14 Mar 2023 15:45:35 GMT
- Title: Finding the Needle in a Haystack: Unsupervised Rationale Extraction from
Long Text Classifiers
- Authors: Kamil Bujel, Andrew Caines, Helen Yannakoudakis and Marek Rei
- Abstract summary: We propose a compositional soft attention architecture that applies RoBERTa sentence-wise to extract plausible rationales at the token-level.
We find this method to significantly outperform Longformer-driven baselines on sentiment classification datasets.
- Score: 20.10172411803626
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Long-sequence transformers are designed to improve the representation of
longer texts by language models and their performance on downstream
document-level tasks. However, not much is understood about the quality of
token-level predictions in long-form models. We investigate the performance of
such architectures in the context of document classification with unsupervised
rationale extraction. We find standard soft attention methods to perform
significantly worse when combined with the Longformer language model. We
propose a compositional soft attention architecture that applies RoBERTa
sentence-wise to extract plausible rationales at the token-level. We find this
method to significantly outperform Longformer-driven baselines on sentiment
classification datasets, while also exhibiting significantly lower runtimes.
Related papers
- How much do contextualized representations encode long-range context? [10.188367784207049]
We analyze contextual representations in neural autoregressive language models, emphasizing long-range contexts that span several thousand tokens.
Our methodology employs a perturbation setup and the metric emphAnisotropy-Calibrated Cosine Similarity, to capture the degree of contextualization of long-range patterns from the perspective of representation geometry.
arXiv Detail & Related papers (2024-10-16T06:49:54Z) - A Novel LLM-based Two-stage Summarization Approach for Long Dialogues [9.835499880812646]
This study proposes a hierarchical framework that segments and condenses information from long documents.
The condensation stage utilizes an unsupervised generation model to generate condensed data.
The summarization stage fine-tunes the abstractive summarization model on the condensed data to generate the final results.
arXiv Detail & Related papers (2024-10-09T03:42:40Z) - Summarizing long regulatory documents with a multi-step pipeline [2.2591852560804675]
We show that the effectiveness of a two-step architecture for summarizing long regulatory texts varies depending on the model used.
For abstractive encoder-decoder models with short context lengths, the effectiveness of an extractive step varies, whereas for long-context encoder-decoder models, the extractive step worsens their performance.
arXiv Detail & Related papers (2024-08-19T08:07:25Z) - LOCOST: State-Space Models for Long Document Abstractive Summarization [76.31514220737272]
We propose LOCOST: an encoder-decoder architecture based on state-space models for conditional text generation with long context inputs.
With a computational complexity of $O(L log L)$, this architecture can handle significantly longer sequences than state-of-the-art models that are based on sparse attention patterns.
arXiv Detail & Related papers (2024-01-31T15:33:37Z) - HanoiT: Enhancing Context-aware Translation via Selective Context [95.93730812799798]
Context-aware neural machine translation aims to use the document-level context to improve translation quality.
The irrelevant or trivial words may bring some noise and distract the model from learning the relationship between the current sentence and the auxiliary context.
We propose a novel end-to-end encoder-decoder model with a layer-wise selection mechanism to sift and refine the long document context.
arXiv Detail & Related papers (2023-01-17T12:07:13Z) - Model Criticism for Long-Form Text Generation [113.13900836015122]
We apply a statistical tool, model criticism in latent space, to evaluate the high-level structure of generated text.
We perform experiments on three representative aspects of high-level discourse -- coherence, coreference, and topicality.
We find that transformer-based language models are able to capture topical structures but have a harder time maintaining structural coherence or modeling coreference.
arXiv Detail & Related papers (2022-10-16T04:35:58Z) - Modeling Context With Linear Attention for Scalable Document-Level
Translation [72.41955536834702]
We investigate the efficacy of a recent linear attention model on document translation and augment it with a sentential gate to promote a recency inductive bias.
We show that sentential gating further improves translation quality on IWSLT.
arXiv Detail & Related papers (2022-10-16T03:41:50Z) - Adapting Pretrained Text-to-Text Models for Long Text Sequences [39.62224414485055]
We adapt an existing pretrained text-to-text model for long-sequence inputs.
We build a long-context model that achieves competitive performance on long-text QA tasks.
arXiv Detail & Related papers (2022-09-21T00:41:07Z) - Long Document Summarization with Top-down and Bottom-up Inference [113.29319668246407]
We propose a principled inference framework to improve summarization models on two aspects.
Our framework assumes a hierarchical latent structure of a document where the top-level captures the long range dependency.
We demonstrate the effectiveness of the proposed framework on a diverse set of summarization datasets.
arXiv Detail & Related papers (2022-03-15T01:24:51Z) - HETFORMER: Heterogeneous Transformer with Sparse Attention for Long-Text
Extractive Summarization [57.798070356553936]
HETFORMER is a Transformer-based pre-trained model with multi-granularity sparse attentions for extractive summarization.
Experiments on both single- and multi-document summarization tasks show that HETFORMER achieves state-of-the-art performance in Rouge F1.
arXiv Detail & Related papers (2021-10-12T22:42:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.