Related papers: Virtual Replay Cache

Virtual Replay Cache

URL: http://arxiv.org/abs/2112.03421v1
Date: Mon, 6 Dec 2021 23:40:27 GMT
Title: Virtual Replay Cache
Authors: Brett Daley and Christopher Amato
Abstract summary: We propose a new data structure, the Virtual Replay Cache (VRC), to address these shortcomings. VRC nearly eliminates DQN(lambda)'s cache memory footprint and slightly reduces the total training time on our hardware.
Score: 20.531576904743282
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Return caching is a recent strategy that enables efficient minibatch training with multistep estimators (e.g. the {\lambda}-return) for deep reinforcement learning. By precomputing return estimates in sequential batches and then storing the results in an auxiliary data structure for later sampling, the average computation spent per estimate can be greatly reduced. Still, the efficiency of return caching could be improved, particularly with regard to its large memory usage and repetitive data copies. We propose a new data structure, the Virtual Replay Cache (VRC), to address these shortcomings. When learning to play Atari 2600 games, the VRC nearly eliminates DQN({\lambda})'s cache memory footprint and slightly reduces the total training time on our hardware.

Related papers

GPS: Distilling Compact Memories via Grid-based Patch Sampling for Efficient Online Class-Incremental Learning [20.112448377660854]
We introduce Grid-based Patch Sampling (GPS), a lightweight strategy for distilling informative memory samples without relying on a trainable model. GPS generates informative samples by sampling a subset of pixels from the original image, yielding compact low-resolution representations. GPS can be seamlessly integrated into existing replay frameworks, leading to 3%-4% improvements in average end accuracy under memory-constrained settings.
arXiv Detail & Related papers (2025-04-14T16:58:02Z)
Cache-Craft: Managing Chunk-Caches for Efficient Retrieval-Augmented Generation [14.842469293627271]
CacheCraft is a system for managing reusing precomputed KVs corresponding to the text chunks. We show how to identify chunk-caches that are reusable, how to efficiently perform a small fraction of recomputation to fix the cache, and how to efficiently store and evict chunk-caches in the hardware.
arXiv Detail & Related papers (2025-02-05T14:12:33Z)
CSR:Achieving 1 Bit Key-Value Cache via Sparse Representation [63.65323577445951]
We propose a novel approach called Cache Sparse Representation (CSR) CSR transforms the dense Key-Value cache tensor into sparse indexes and weights, offering a more memory-efficient representation during LLM inference. Our experiments demonstrate CSR achieves performance comparable to state-of-the-art KV cache quantization algorithms.
arXiv Detail & Related papers (2024-12-16T13:01:53Z)
Retro-li: Small-Scale Retrieval Augmented Generation Supporting Noisy Similarity Searches and Domain Shift Generalization [36.251000184801576]
Retro has been shown to improve language modeling capabilities and reduce toxicity and hallucinations by retrieving from a database of non-parametric memory containing trillions of entries. We introduce Retro-li that shows retrieval can also help using a small-scale database, but it demands more accurate and better neighbors when searching in a smaller hence sparser non-parametric memory. We show that Retro-li's non-parametric memory can potentially be implemented on analog in-memory computing hardware, exhibiting O(1) search time while causing noise in retrieving neighbors, with minimal (1%) performance loss.
arXiv Detail & Related papers (2024-09-12T23:29:33Z)
PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM Inference [57.53291046180288]
Large Language Models (LLMs) have shown remarkable comprehension abilities but face challenges in GPU memory usage during inference. We propose PyramidInfer, a method that compresses the KV cache by layer-wise retaining crucial context. PyramidInfer improves 2.2x throughput compared to Accelerate with over 54% GPU memory reduction in KV cache.
arXiv Detail & Related papers (2024-05-21T06:46:37Z)
Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model [89.8764435351222]
We propose a new family of unbiased estimators called WTA-CRS, for matrix production with reduced variance. Our work provides both theoretical and experimental evidence that, in the context of tuning transformers, our proposed estimators exhibit lower variance compared to existing ones.
arXiv Detail & Related papers (2023-05-24T15:52:08Z)
Improving information retention in large scale online continual learning [99.73847522194549]
Online continual learning aims to adapt efficiently to new data while retaining existing knowledge. Recent work suggests that information retention remains a problem in large scale OCL even when the replay buffer is unlimited. We propose using a moving average family of methods to improve optimization for non-stationary objectives.
arXiv Detail & Related papers (2022-10-12T16:59:43Z)
Memory-efficient Reinforcement Learning with Value-based Knowledge Consolidation [14.36005088171571]
We propose memory-efficient reinforcement learning algorithms based on the deep Q-network algorithm. Our algorithms reduce forgetting and maintain high sample efficiency by consolidating knowledge from the target Q-network to the current Q-network.
arXiv Detail & Related papers (2022-05-22T17:02:51Z)
Memory Replay with Data Compression for Continual Learning [80.95444077825852]
We propose memory replay with data compression to reduce the storage cost of old training samples. We extensively validate this across several benchmarks of class-incremental learning and in a realistic scenario of object detection for autonomous driving.
arXiv Detail & Related papers (2022-02-14T10:26:23Z)
Mesa: A Memory-saving Training Framework for Transformers [58.78933015299703]
We present Mesa, a memory-saving training framework for Transformers. Mesa uses exact activations during forward pass while storing a low-precision version of activations to reduce memory consumption during training. Experiments on ImageNet, CIFAR-100 and ADE20K demonstrate that Mesa can reduce half of the memory footprints during training.
arXiv Detail & Related papers (2021-11-22T11:23:01Z)
Improving Computational Efficiency in Visual Reinforcement Learning via Stored Embeddings [89.63764845984076]
We present Stored Embeddings for Efficient Reinforcement Learning (SEER) SEER is a simple modification of existing off-policy deep reinforcement learning methods. We show that SEER does not degrade the performance of RLizable agents while significantly saving computation and memory.
arXiv Detail & Related papers (2021-03-04T08:14:10Z)
Improving compute efficacy frontiers with SliceOut [31.864949424541344]
We introduce SliceOut -- a dropout-inspired scheme to train deep learning models faster without impacting final test accuracy. At test time, turning off SliceOut performs an implicit ensembling across a linear number of architectures that preserves test accuracy. This leads to faster processing of large computational workloads overall, and significantly reduce the resulting energy consumption and CO2emissions.
arXiv Detail & Related papers (2020-07-21T15:59:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.