Related papers: Learning to Summarize Long Texts with Memory Compression and Transfer

Learning to Summarize Long Texts with Memory Compression and Transfer

URL: http://arxiv.org/abs/2010.11322v1
Date: Wed, 21 Oct 2020 21:45:44 GMT
Title: Learning to Summarize Long Texts with Memory Compression and Transfer
Authors: Jaehong Park, Jonathan Pilault and Christopher Pal
Abstract summary: We introduce Mem2Mem, a memory-to-memory mechanism for hierarchical recurrent neural network based encoder decoder architectures. Our memory regularization compresses an encoded input article into a more compact set of sentence representations.
Score: 3.5407857489235206
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: We introduce Mem2Mem, a memory-to-memory mechanism for hierarchical recurrent neural network based encoder decoder architectures and we explore its use for abstractive document summarization. Mem2Mem transfers "memories" via readable/writable external memory modules that augment both the encoder and decoder. Our memory regularization compresses an encoded input article into a more compact set of sentence representations. Most importantly, the memory compression step performs implicit extraction without labels, sidestepping issues with suboptimal ground-truth data and exposure bias of hybrid extractive-abstractive summarization techniques. By allowing the decoder to read/write over the encoded input memory, the model learns to read salient information about the input article while keeping track of what has been generated. Our Mem2Mem approach yields results that are competitive with state of the art transformer based summarization methods, but with 16 times fewer parameters

Related papers

Cognitive Memory in Large Language Models [8.059261857307881]
This paper examines memory mechanisms in Large Language Models (LLMs), emphasizing their importance for context-rich responses, reduced hallucinations, and improved efficiency. It categorizes memory into sensory, short-term, and long-term, with sensory memory corresponding to input prompts, short-term memory processing immediate context, and long-term memory implemented via external databases or structures.
arXiv Detail & Related papers (2025-04-03T09:58:19Z)
Skrr: Skip and Re-use Text Encoder Layers for Memory Efficient Text-to-Image Generation [13.310412868082832]
Large-scale text encoders in text-to-image (T2I) diffusion models have demonstrated exceptional performance. Despite their minimal contribution to total inference time and floating-point operations (FLOPs), text encoders demand significantly higher memory usage. We propose Skip and Re-use layers (Skrr), a simple yet effective pruning strategy specifically designed for text encoders in T2I diffusion models.
arXiv Detail & Related papers (2025-02-12T15:03:26Z)
BitStack: Fine-Grained Size Control for Compressed Large Language Models in Variable Memory Environments [53.71158537264695]
Large language models (LLMs) have revolutionized numerous applications, yet their deployment remains challenged by memory constraints on local devices. We introduce textbfBitStack, a novel, training-free weight compression approach that enables megabyte-level trade-offs between memory usage and model performance.
arXiv Detail & Related papers (2024-10-31T13:26:11Z)
Context Compression for Auto-regressive Transformers with Sentinel Tokens [37.07722536907739]
We propose a plug-and-play approach that is able to incrementally compress the intermediate activation of a specified span of tokens into compact ones. Experiments on both in-domain language modeling and zero-shot open-ended document generation demonstrate the advantage of our approach.
arXiv Detail & Related papers (2023-10-12T09:18:19Z)
Implicit Memory Transformer for Computationally Efficient Simultaneous Speech Translation [0.20305676256390928]
We propose an Implicit Memory Transformer that implicitly retains memory through a new left context method. Experiments on the MuST-C dataset show that the Implicit Memory Transformer provides a substantial speedup on the encoder forward pass.
arXiv Detail & Related papers (2023-07-03T22:20:21Z)
GLIMMER: generalized late-interaction memory reranker [29.434777627686692]
Memory-augmentation is a powerful approach for incorporating external information into language models. Recent work introduced LUMEN, a memory-retrieval hybrid that partially pre-computes memory and updates memory representations on the fly with a smaller live encoder. We propose GLIMMER, which improves on this approach through 1) exploiting free access to the powerful memory representations by applying a shallow reranker on top of memory to drastically improve retrieval quality at low cost.
arXiv Detail & Related papers (2023-06-17T01:54:25Z)
Differentiable Neural Computers with Memory Demon [0.0]
We show that information theoretic properties of the memory contents play an important role in the performance of such architectures. We introduce a novel concept of memory demon to DNC architectures which modifies the memory contents implicitly via additive input encoding.
arXiv Detail & Related papers (2022-11-05T22:24:47Z)
Small Lesion Segmentation in Brain MRIs with Subpixel Embedding [105.1223735549524]
We present a method to segment MRI scans of the human brain into ischemic stroke lesion and normal tissues. We propose a neural network architecture in the form of a standard encoder-decoder where predictions are guided by a spatial expansion embedding network.
arXiv Detail & Related papers (2021-09-18T00:21:17Z)
Kanerva++: extending The Kanerva Machine with differentiable, locally block allocated latent memory [75.65949969000596]
Episodic and semantic memory are critical components of the human memory model. We develop a new principled Bayesian memory allocation scheme that bridges the gap between episodic and semantic memory. We demonstrate that this allocation scheme improves performance in memory conditional image generation.
arXiv Detail & Related papers (2021-02-20T18:40:40Z)
Text Compression-aided Transformer Encoding [77.16960983003271]
We propose explicit and implicit text compression approaches to enhance the Transformer encoding. backbone information, meaning the gist of the input text, is not specifically focused on. Our evaluation on benchmark datasets shows that the proposed explicit and implicit text compression approaches improve results in comparison to strong baselines.
arXiv Detail & Related papers (2021-02-11T11:28:39Z)
Memformer: A Memory-Augmented Transformer for Sequence Modeling [55.780849185884996]
We present Memformer, an efficient neural network for sequence modeling. Our model achieves linear time complexity and constant memory space complexity when processing long sequences.
arXiv Detail & Related papers (2020-10-14T09:03:36Z)
MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning [128.36951818335046]
We propose a new approach called Memory-Augmented Recurrent Transformer (MART) MART uses a memory module to augment the transformer architecture. MART generates more coherent and less repetitive paragraph captions than baseline methods.
arXiv Detail & Related papers (2020-05-11T20:01:41Z)
Learning Directly from Grammar Compressed Text [17.91878224879985]
We propose a method to apply neural sequence models to text data compressed with grammar compression algorithms without decompression. To encode the unique symbols that appear in compression rules, we introduce composer modules to incrementally encode the symbols into vector representations.
arXiv Detail & Related papers (2020-02-28T06:51:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.