Learning to Summarize Long Texts with Memory Compression and Transfer
- URL: http://arxiv.org/abs/2010.11322v1
- Date: Wed, 21 Oct 2020 21:45:44 GMT
- Title: Learning to Summarize Long Texts with Memory Compression and Transfer
- Authors: Jaehong Park, Jonathan Pilault and Christopher Pal
- Abstract summary: We introduce Mem2Mem, a memory-to-memory mechanism for hierarchical recurrent neural network based encoder decoder architectures.
Our memory regularization compresses an encoded input article into a more compact set of sentence representations.
- Score: 3.5407857489235206
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We introduce Mem2Mem, a memory-to-memory mechanism for hierarchical recurrent
neural network based encoder decoder architectures and we explore its use for
abstractive document summarization. Mem2Mem transfers "memories" via
readable/writable external memory modules that augment both the encoder and
decoder. Our memory regularization compresses an encoded input article into a
more compact set of sentence representations. Most importantly, the memory
compression step performs implicit extraction without labels, sidestepping
issues with suboptimal ground-truth data and exposure bias of hybrid
extractive-abstractive summarization techniques. By allowing the decoder to
read/write over the encoded input memory, the model learns to read salient
information about the input article while keeping track of what has been
generated. Our Mem2Mem approach yields results that are competitive with state
of the art transformer based summarization methods, but with 16 times fewer
parameters
Related papers
- BitStack: Fine-Grained Size Control for Compressed Large Language Models in Variable Memory Environments [53.71158537264695]
Large language models (LLMs) have revolutionized numerous applications, yet their deployment remains challenged by memory constraints on local devices.
We introduce textbfBitStack, a novel, training-free weight compression approach that enables megabyte-level trade-offs between memory usage and model performance.
arXiv Detail & Related papers (2024-10-31T13:26:11Z) - Context Compression for Auto-regressive Transformers with Sentinel
Tokens [37.07722536907739]
We propose a plug-and-play approach that is able to incrementally compress the intermediate activation of a specified span of tokens into compact ones.
Experiments on both in-domain language modeling and zero-shot open-ended document generation demonstrate the advantage of our approach.
arXiv Detail & Related papers (2023-10-12T09:18:19Z) - Implicit Memory Transformer for Computationally Efficient Simultaneous
Speech Translation [0.20305676256390928]
We propose an Implicit Memory Transformer that implicitly retains memory through a new left context method.
Experiments on the MuST-C dataset show that the Implicit Memory Transformer provides a substantial speedup on the encoder forward pass.
arXiv Detail & Related papers (2023-07-03T22:20:21Z) - GLIMMER: generalized late-interaction memory reranker [29.434777627686692]
Memory-augmentation is a powerful approach for incorporating external information into language models.
Recent work introduced LUMEN, a memory-retrieval hybrid that partially pre-computes memory and updates memory representations on the fly with a smaller live encoder.
We propose GLIMMER, which improves on this approach through 1) exploiting free access to the powerful memory representations by applying a shallow reranker on top of memory to drastically improve retrieval quality at low cost.
arXiv Detail & Related papers (2023-06-17T01:54:25Z) - Differentiable Neural Computers with Memory Demon [0.0]
We show that information theoretic properties of the memory contents play an important role in the performance of such architectures.
We introduce a novel concept of memory demon to DNC architectures which modifies the memory contents implicitly via additive input encoding.
arXiv Detail & Related papers (2022-11-05T22:24:47Z) - Small Lesion Segmentation in Brain MRIs with Subpixel Embedding [105.1223735549524]
We present a method to segment MRI scans of the human brain into ischemic stroke lesion and normal tissues.
We propose a neural network architecture in the form of a standard encoder-decoder where predictions are guided by a spatial expansion embedding network.
arXiv Detail & Related papers (2021-09-18T00:21:17Z) - Kanerva++: extending The Kanerva Machine with differentiable, locally
block allocated latent memory [75.65949969000596]
Episodic and semantic memory are critical components of the human memory model.
We develop a new principled Bayesian memory allocation scheme that bridges the gap between episodic and semantic memory.
We demonstrate that this allocation scheme improves performance in memory conditional image generation.
arXiv Detail & Related papers (2021-02-20T18:40:40Z) - Text Compression-aided Transformer Encoding [77.16960983003271]
We propose explicit and implicit text compression approaches to enhance the Transformer encoding.
backbone information, meaning the gist of the input text, is not specifically focused on.
Our evaluation on benchmark datasets shows that the proposed explicit and implicit text compression approaches improve results in comparison to strong baselines.
arXiv Detail & Related papers (2021-02-11T11:28:39Z) - Memformer: A Memory-Augmented Transformer for Sequence Modeling [55.780849185884996]
We present Memformer, an efficient neural network for sequence modeling.
Our model achieves linear time complexity and constant memory space complexity when processing long sequences.
arXiv Detail & Related papers (2020-10-14T09:03:36Z) - MART: Memory-Augmented Recurrent Transformer for Coherent Video
Paragraph Captioning [128.36951818335046]
We propose a new approach called Memory-Augmented Recurrent Transformer (MART)
MART uses a memory module to augment the transformer architecture.
MART generates more coherent and less repetitive paragraph captions than baseline methods.
arXiv Detail & Related papers (2020-05-11T20:01:41Z) - Learning Directly from Grammar Compressed Text [17.91878224879985]
We propose a method to apply neural sequence models to text data compressed with grammar compression algorithms without decompression.
To encode the unique symbols that appear in compression rules, we introduce composer modules to incrementally encode the symbols into vector representations.
arXiv Detail & Related papers (2020-02-28T06:51:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.