Adaptive Thompson Sampling Stacks for Memory Bounded Open-Loop Planning
- URL: http://arxiv.org/abs/1907.05861v2
- Date: Thu, 28 Dec 2023 01:18:03 GMT
- Title: Adaptive Thompson Sampling Stacks for Memory Bounded Open-Loop Planning
- Authors: Thomy Phan, Thomas Gabor, Robert M\"uller, Christoph Roch, Claudia
Linnhoff-Popien
- Abstract summary: We propose Stable Yet Memory Bounded Open-Loop (SYMBOL) planning, a general memory bounded approach to partially observable open-loop planning.
SYMBOL maintains an adaptive stack of Thompson Sampling bandits, whose size is bounded by the planning horizon and can be automatically adapted according to the underlying domain without any prior domain knowledge beyond a generative model.
- Score: 9.805886870200872
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose Stable Yet Memory Bounded Open-Loop (SYMBOL) planning, a general
memory bounded approach to partially observable open-loop planning. SYMBOL
maintains an adaptive stack of Thompson Sampling bandits, whose size is bounded
by the planning horizon and can be automatically adapted according to the
underlying domain without any prior domain knowledge beyond a generative model.
We empirically test SYMBOL in four large POMDP benchmark problems to
demonstrate its effectiveness and robustness w.r.t. the choice of
hyperparameters and evaluate its adaptive memory consumption. We also compare
its performance with other open-loop planning algorithms and POMCP.
Related papers
- Futureproof Static Memory Planning [7.031511274524772]
"AI memory wall" combined with deep neural networks' static architecture has reignited interest in dynamic storage allocation.
We present idealloc, a low-fragmentation, high-performance DSA implementation designed for million-buffer instances.
arXiv Detail & Related papers (2025-04-07T09:28:54Z) - FLAMES: A Hybrid Spiking-State Space Model for Adaptive Memory Retention in Event-Based Learning [16.60622265961373]
FLAMES is a hybrid framework integrating structured state-space dynamics with event-driven computation.
By bridging neuromorphic computing and structured sequence modeling, FLAMES enables scalable long-range reasoning in event-driven systems.
arXiv Detail & Related papers (2025-04-02T00:08:19Z) - COSMOS: A Hybrid Adaptive Optimizer for Memory-Efficient Training of LLMs [81.01082659623552]
Large Language Models (LLMs) have demonstrated remarkable success across various domains.
Their optimization remains a significant challenge due to the complex and high-dimensional loss landscapes they inhabit.
arXiv Detail & Related papers (2025-02-24T18:42:19Z) - A Memory Efficient Randomized Subspace Optimization Method for Training Large Language Models [22.725326215887435]
We introduce a Randomized Subspace Optimization framework for pre-training and fine-tuning Large Language Models.
Our approach decomposes the high-dimensional training problem into a series of lower-dimensional subproblems.
This structured reduction in dimensionality allows our method to simultaneously reduce memory usage for both activations and states.
arXiv Detail & Related papers (2025-02-11T03:32:10Z) - Memory Layers at Scale [67.00854080570979]
This work takes memory layers beyond proof-of-concept, proving their utility at contemporary scale.
On downstream tasks, language models augmented with our improved memory layer outperform dense models with more than twice the budget, as well as mixture-of-expert models when matched for both compute and parameters.
We provide a fully parallelizable memory layer implementation, demonstrating scaling laws with up to 128B memory parameters, pretrained to 1 trillion tokens, comparing to base models with up to 8B parameters.
arXiv Detail & Related papers (2024-12-12T23:56:57Z) - Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss [59.835032408496545]
We propose a tile-based strategy that partitions the contrastive loss calculation into arbitrary small blocks.
We also introduce a multi-level tiling strategy to leverage the hierarchical structure of distributed systems.
Compared to SOTA memory-efficient solutions, it achieves a two-order-of-magnitude reduction in memory while maintaining comparable speed.
arXiv Detail & Related papers (2024-10-22T17:59:30Z) - Efficient Learning of POMDPs with Known Observation Model in Average-Reward Setting [56.92178753201331]
We propose the Observation-Aware Spectral (OAS) estimation technique, which enables the POMDP parameters to be learned from samples collected using a belief-based policy.
We show the consistency of the OAS procedure, and we prove a regret guarantee of order $mathcalO(sqrtT log(T)$ for the proposed OAS-UCRL algorithm.
arXiv Detail & Related papers (2024-10-02T08:46:34Z) - SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios.
In the early route, intermediate outputs are consolidated via an anti-redundancy operation.
In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z) - AdaLomo: Low-memory Optimization with Adaptive Learning Rate [59.64965955386855]
We introduce low-memory optimization with adaptive learning rate (AdaLomo) for large language models.
AdaLomo results on par with AdamW, while significantly reducing memory requirements, thereby lowering the hardware barrier to training large language models.
arXiv Detail & Related papers (2023-10-16T09:04:28Z) - Efficient Global Planning in Large MDPs via Stochastic Primal-Dual
Optimization [12.411844611718958]
We show that our method outputs a near-optimal policy after a number of queries to the generative model.
Our method is computationally efficient and comes with the major advantage that it outputs a single softmax policy that is compactly represented by a low-dimensional parameter vector.
arXiv Detail & Related papers (2022-10-21T15:49:20Z) - Memory-Efficient Differentiable Programming for Quantum Optimal Control
of Discrete Lattices [1.5012666537539614]
Quantum optimal control problems are typically solved by gradient-based algorithms such as GRAPE.
QOC reveals that memory requirements are a barrier for simulating large models or long time spans.
We employ a nonstandard differentiable programming approach that significantly reduces the memory requirements at the cost of a reasonable amount of recomputation.
arXiv Detail & Related papers (2022-10-15T20:59:23Z) - Dynamic Ensemble Size Adjustment for Memory Constrained Mondrian Forest [0.0]
In this paper, we show that under memory constraints, increasing the size of a tree-based ensemble classifier can worsen its performance.
We experimentally show the existence of an optimal ensemble size for a memory-bounded Mondrian forest on data streams.
We conclude that our method can achieve up to 95% of the performance of an optimally-sized Mondrian forest for stable datasets.
arXiv Detail & Related papers (2022-10-11T18:05:58Z) - A Model or 603 Exemplars: Towards Memory-Efficient Class-Incremental
Learning [56.450090618578]
Class-Incremental Learning (CIL) aims to train a model with limited memory size to meet this requirement.
We show that when counting the model size into the total budget and comparing methods with aligned memory size, saving models do not consistently work.
We propose a simple yet effective baseline, denoted as MEMO for Memory-efficient Expandable MOdel.
arXiv Detail & Related papers (2022-05-26T08:24:01Z) - Differentiable Random Access Memory using Lattices [0.0]
We introduce a differentiable random access memory module with $O(1)$ performance regardless of size.
The design stores entries on points of a chosen lattice to calculate nearest neighbours of arbitrary points efficiently by exploiting symmetries.
arXiv Detail & Related papers (2021-07-07T20:55:42Z) - Semantically Constrained Memory Allocation (SCMA) for Embedding in
Efficient Recommendation Systems [27.419109620575313]
A key challenge for deep learning models is to work with millions of categorical classes or tokens.
We propose a novel formulation of memory shared embedding, where memory is shared in proportion to the overlap in semantic information.
We demonstrate a significant reduction in the memory footprint while maintaining performance.
arXiv Detail & Related papers (2021-02-24T19:55:49Z) - Modular Deep Reinforcement Learning for Continuous Motion Planning with
Temporal Logic [59.94347858883343]
This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP)
The novelty is to design an embedded product MDP (EP-MDP) between the LDGBA and the MDP.
The proposed LDGBA-based reward shaping and discounting schemes for the model-free reinforcement learning (RL) only depend on the EP-MDP states.
arXiv Detail & Related papers (2021-02-24T01:11:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.