Related papers: Learning Hierarchical Procedural Memory for LLM Agents through Bayesian Selection and Contrastive Refinement

Learning Hierarchical Procedural Memory for LLM Agents through Bayesian Selection and Contrastive Refinement

URL: http://arxiv.org/abs/2512.18950v1
Date: Mon, 22 Dec 2025 01:56:28 GMT
Title: Learning Hierarchical Procedural Memory for LLM Agents through Bayesian Selection and Contrastive Refinement
Authors: Saman Forouzandeh, Wei Peng, Parham Moradi, Xinghuo Yu, Mahdi Jalili,
Abstract summary: We present MACLA, a framework that decouples reasoning from learning by maintaining a frozen large language model while performing all adaptation in an external hierarchical procedural memory.<n> MACLA extracts reusable procedures from trajectories, tracks reliability via Bayesian posteriors, selects actions through expected-utility scoring and refines procedures by contrasting successes and failures.<n>Across four benchmarks (ALFWorld, WebShop, TravelPlanner, InterCode), MACLA achieves 78.1 percent average performance, outperforming all baselines.
Score: 23.31711942240935
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present MACLA, a framework that decouples reasoning from learning by maintaining a frozen large language model while performing all adaptation in an external hierarchical procedural memory. MACLA extracts reusable procedures from trajectories, tracks reliability via Bayesian posteriors, selects actions through expected-utility scoring, and refines procedures by contrasting successes and failures. Across four benchmarks (ALFWorld, WebShop, TravelPlanner, InterCodeSQL), MACLA achieves 78.1 percent average performance, outperforming all baselines. On ALFWorld unseen tasks, MACLA reaches 90.3 percent with 3.1 percent positive generalization. The system constructs memory in 56 seconds, 2800 times faster than the state-of-the-art LLM parameter-training baseline, compressing 2851 trajectories into 187 procedures. Experimental results demonstrate that structured external memory with Bayesian selection and contrastive refinement enables sample-efficient, interpretable, and continually improving agents without LLM parameter updates.

Related papers

Global Prior Meets Local Consistency: Dual-Memory Augmented Vision-Language-Action Model for Efficient Robotic Manipulation [95.89924101984566]
We introduce OptimusVLA, a dual-memory VLA framework with Global Prior Memory (GPM) and Local Consistency Memory (LCM)<n>GPM replaces Gaussian noise with task-level priors retrieved from semantically similar trajectories.<n>LCM injects a learned consistency constraint that enforces temporal coherence and smoothness of trajectory.
arXiv Detail & Related papers (2026-02-22T15:39:34Z)
Prism: Efficient Test-Time Scaling via Hierarchical Search and Self-Verification for Discrete Diffusion Language Models [96.0074341403456]
Inference-time compute has re-emerged as a practical way to improve LLM reasoning.<n>Most test-time scaling (TTS) algorithms rely on autoregressive decoding.<n>We propose Prism, an efficient TTS framework for dLLMs.
arXiv Detail & Related papers (2026-02-02T09:14:51Z)
MoDES: Accelerating Mixture-of-Experts Multimodal Large Language Models via Dynamic Expert Skipping [52.02659589971978]
We propose MoDES, the first training-free framework that adaptively skips experts to enable efficient and accurate MoE MLLM inference.<n>MoDES significantly enhances inference speed, improving the prefilling time by 2.16$times$ and the decoding time by 1.26$times$.
arXiv Detail & Related papers (2025-11-19T18:48:27Z)
Identifying Imaging Follow-Up in Radiology Reports: A Comparative Analysis of Traditional ML and LLM Approaches [8.864020712680976]
We introduce an annotated corpus of 6,393 radiology reports from 586 patients, each labeled for follow-up imaging status.<n>We compare traditional machine-learning classifiers, including logistic regression (LR), support vector machines (SVM), Longformer, and a fully fine-tuned Llama3-8B-Instruct.<n>To evaluate generative LLMs, we tested GPT-4o and the open-source GPT-OSS-20B under two configurations.
arXiv Detail & Related papers (2025-11-14T20:55:44Z)
Amortized Bayesian Meta-Learning for Low-Rank Adaptation of Large Language Models [7.075648770762989]
Fine-tuning large language models with low-rank adaptaion (LoRA) is a cost-effective way to incorporate information from a specific dataset.<n>It is often unclear how well the fine-tuned LLM will generalize, i.e., how well it will perform on unseen datasets.<n>We propose Amortized Bayesian Meta-Learning for LoRA (ABMLL) to improve generalization and scales to large models.
arXiv Detail & Related papers (2025-08-19T21:57:59Z)
VAULT: Vigilant Adversarial Updates via LLM-Driven Retrieval-Augmented Generation for NLI [15.320553375828045]
VAULT is a fully automated adversarial RAG pipeline that uncovers and remedies weaknesses in NLI models.<n>VAULT consistently outperforms prior in-context adversarial methods by up to 2.0% across datasets.
arXiv Detail & Related papers (2025-08-01T14:22:54Z)
Cache-Efficient Posterior Sampling for Reinforcement Learning with LLM-Derived Priors Across Discrete and Continuous Domains [2.1797343876622097]
Large language models (LLMs) as priors in reinforcement learning (RL) offers significant advantages but comes with substantial computational costs.<n>We present a principled cache-efficient framework for posterior sampling with LLM-derived priors that dramatically reduces these costs while maintaining high performance.
arXiv Detail & Related papers (2025-05-12T06:53:24Z)
Enhancing LLM Code Generation with Ensembles: A Similarity-Based Selection Approach [6.93983229112122]
We propose an ensemble approach for large language models (LLMs) in code generation.<n>For voting, we compute syntactic and semantic similarity using CodeBLEU and behavioral equivalence.<n>We show through experiments that our ensemble approach consistently outperforms standalone LLMs.
arXiv Detail & Related papers (2025-03-20T04:38:56Z)
Confident or Seek Stronger: Exploring Uncertainty-Based On-device LLM Routing From Benchmarking to Generalization [61.02719787737867]
Large language models (LLMs) are increasingly deployed and democratized on edge devices.<n>One promising solution is uncertainty-based SLM routing, offloading high-stakes queries to stronger LLMs when resulting in low-confidence responses on SLM.<n>We conduct a comprehensive investigation into benchmarking and generalization of uncertainty-driven routing strategies from SLMs to LLMs over 1500+ settings.
arXiv Detail & Related papers (2025-02-06T18:59:11Z)
Adaptive Pruning for Large Language Models with Structural Importance Awareness [66.2690963378878]
Large language models (LLMs) have significantly improved language understanding and generation capabilities.<n>LLMs are difficult to deploy on resource-constrained edge devices due to their high computational and storage resource demands.<n>We propose structurally-aware adaptive pruning (SAAP) to significantly reduce the computational and memory costs while maintaining model performance.
arXiv Detail & Related papers (2024-12-19T18:08:04Z)
Bypass Back-propagation: Optimization-based Structural Pruning for Large Language Models via Policy Gradient [57.9629676017527]
We propose an optimization-based structural pruning that learns the pruning masks in a probabilistic space directly by optimizing the loss of the pruned model.<n>We achieve this by learning an underlying Bernoulli distribution to sample binary pruning masks.<n>Experiments conducted on LLaMA, LLaMA-2, LLaMA-3, Vicuna, and Mistral models demonstrate the promising performance of our method in efficiency and effectiveness.
arXiv Detail & Related papers (2024-06-15T09:31:03Z)
Regression-aware Inference with LLMs [52.764328080398805]
We show that an inference strategy can be sub-optimal for common regression and scoring evaluation metrics. We propose alternate inference strategies that estimate the Bayes-optimal solution for regression and scoring metrics in closed-form from sampled responses.
arXiv Detail & Related papers (2024-03-07T03:24:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.