Extend and Explain: Interpreting Very Long Language Models
- URL: http://arxiv.org/abs/2209.01174v1
- Date: Fri, 2 Sep 2022 17:15:43 GMT
- Title: Extend and Explain: Interpreting Very Long Language Models
- Authors: Joel Stremmel, Brian L. Hill, Jeffrey Hertzberg, Jaime Murillo,
Llewelyn Allotey, Eran Halperin
- Abstract summary: We introduce a novel Masked Sampling Procedure (MSP) to identify the text blocks that contribute to a prediction.
MSP identifies 1.7x more clinically informative text blocks than the previous state-of-the-art, runs up to 100x faster, and is tractable for generating important phrase pairs.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While Transformer language models (LMs) are state-of-the-art for information
extraction, long text introduces computational challenges requiring suboptimal
preprocessing steps or alternative model architectures. Sparse-attention LMs
can represent longer sequences, overcoming performance hurdles. However, it
remains unclear how to explain predictions from these models, as not all tokens
attend to each other in the self-attention layers, and long sequences pose
computational challenges for explainability algorithms when runtime depends on
document length. These challenges are severe in the medical context where
documents can be very long, and machine learning (ML) models must be auditable
and trustworthy. We introduce a novel Masked Sampling Procedure (MSP) to
identify the text blocks that contribute to a prediction, apply MSP in the
context of predicting diagnoses from medical text, and validate our approach
with a blind review by two clinicians. Our method identifies about 1.7x more
clinically informative text blocks than the previous state-of-the-art, runs up
to 100x faster, and is tractable for generating important phrase pairs. MSP is
particularly well-suited to long LMs but can be applied to any text classifier.
We provide a general implementation of MSP.
Related papers
- Equipping Transformer with Random-Access Reading for Long-Context Understanding [9.433800833564279]
Long-context modeling presents a significant challenge for transformer-based large language models.
We propose a novel reading strategy that enables transformers to efficiently process long documents without examining every token.
arXiv Detail & Related papers (2024-05-21T21:41:07Z) - TextGenSHAP: Scalable Post-hoc Explanations in Text Generation with Long
Documents [34.52684986240312]
We introduce TextGenSHAP, an efficient post-hoc explanation method incorporating LM-specific techniques.
We demonstrate that this leads to significant increases in speed compared to conventional Shapley value computations.
In addition, we demonstrate how real-time Shapley values can be utilized in two important scenarios.
arXiv Detail & Related papers (2023-12-03T04:35:04Z) - Not Enough Labeled Data? Just Add Semantics: A Data-Efficient Method for
Inferring Online Health Texts [0.0]
We employ Abstract Representation (AMR) graphs as a means to model low-resource Health NLP tasks.
AMRs are well suited to model online health texts as they represent multi-sentence inputs, abstract away from complex terminology, and model long-distance relationships.
Our experiments show that we can improve performance on 6 low-resource health NLP tasks by augmenting text embeddings with semantic graph embeddings.
arXiv Detail & Related papers (2023-09-18T15:37:30Z) - Simultaneous Machine Translation with Large Language Models [51.470478122113356]
We investigate the possibility of applying Large Language Models to SimulMT tasks.
We conducted experiments using the textttLlama2-7b-chat model on nine different languages from the MUST-C dataset.
The results show that LLM outperforms dedicated MT models in terms of BLEU and LAAL metrics.
arXiv Detail & Related papers (2023-09-13T04:06:47Z) - The first step is the hardest: Pitfalls of Representing and Tokenizing
Temporal Data for Large Language Models [10.414206635385632]
Large Language Models (LLMs) have demonstrated remarkable generalization across diverse tasks.
A notable obstacle emerges when feeding numerical/temporal data into these models, such as data sourced from wearables or electronic health records.
We discuss recent works that employ LLMs for human-centric tasks such as in mobile health sensing and present a case study showing that popular LLMs tokenize temporal data incorrectly.
arXiv Detail & Related papers (2023-09-12T13:51:29Z) - The Devil is in the Errors: Leveraging Large Language Models for
Fine-grained Machine Translation Evaluation [93.01964988474755]
AutoMQM is a prompting technique which asks large language models to identify and categorize errors in translations.
We study the impact of labeled data through in-context learning and finetuning.
We then evaluate AutoMQM with PaLM-2 models, and we find that it improves performance compared to just prompting for scores.
arXiv Detail & Related papers (2023-08-14T17:17:21Z) - Interpretable Medical Diagnostics with Structured Data Extraction by
Large Language Models [59.89454513692417]
Tabular data is often hidden in text, particularly in medical diagnostic reports.
We propose a novel, simple, and effective methodology for extracting structured tabular data from textual medical reports, called TEMED-LLM.
We demonstrate that our approach significantly outperforms state-of-the-art text classification models in medical diagnostics.
arXiv Detail & Related papers (2023-06-08T09:12:28Z) - An Iterative Optimizing Framework for Radiology Report Summarization with ChatGPT [80.33783969507458]
The 'Impression' section of a radiology report is a critical basis for communication between radiologists and other physicians.
Recent studies have achieved promising results in automatic impression generation using large-scale medical text data.
These models often require substantial amounts of medical text data and have poor generalization performance.
arXiv Detail & Related papers (2023-04-17T17:13:42Z) - MURMUR: Modular Multi-Step Reasoning for Semi-Structured Data-to-Text
Generation [102.20036684996248]
We propose MURMUR, a neuro-symbolic modular approach to text generation from semi-structured data with multi-step reasoning.
We conduct experiments on two data-to-text generation tasks like WebNLG and LogicNLG.
arXiv Detail & Related papers (2022-12-16T17:36:23Z) - Recurrent Chunking Mechanisms for Long-Text Machine Reading
Comprehension [59.80926970481975]
We study machine reading comprehension (MRC) on long texts.
A model takes as inputs a lengthy document and a question and then extracts a text span from the document as an answer.
We propose to let a model learn to chunk in a more flexible way via reinforcement learning.
arXiv Detail & Related papers (2020-05-16T18:08:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.