Related papers: A Component-Based Survey of Interactions between Large Language Models and Multi-Armed Bandits

A Component-Based Survey of Interactions between Large Language Models and Multi-Armed Bandits

URL: http://arxiv.org/abs/2601.12945v2
Date: Wed, 21 Jan 2026 06:29:58 GMT
Title: A Component-Based Survey of Interactions between Large Language Models and Multi-Armed Bandits
Authors: Miao Xie, Siguang Chen, Chunli Lv,
Abstract summary: Large language models (LLMs) have become powerful and widely used systems for language understanding and generation.<n>Multi-armed bandit (MAB) algorithms provide a principled framework for adaptive decision-making under uncertainty.<n>This survey explores the potential at the intersection of these two fields.
Score: 2.969473917919491
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) have become powerful and widely used systems for language understanding and generation, while multi-armed bandit (MAB) algorithms provide a principled framework for adaptive decision-making under uncertainty. This survey explores the potential at the intersection of these two fields. As we know, it is the first survey to systematically review the bidirectional interaction between large language models and multi-armed bandits at the component level. We highlight the bidirectional benefits: MAB algorithms address critical LLM challenges, spanning from pre-training to retrieval-augmented generation (RAG) and personalization. Conversely, LLMs enhance MAB systems by redefining core components such as arm definition and environment modeling, thereby improving decision-making in sequential tasks. We analyze existing LLM-enhanced bandit systems and bandit-enhanced LLM systems, providing insights into their design, methodologies, and performance. Key challenges and representative findings are identified to help guide future research. An accompanying GitHub repository that indexes relevant literature is available at https://github.com/bucky1119/Awesome-LLM-Bandit-Interaction.

Related papers

Discrete Tokenization for Multimodal LLMs: A Comprehensive Survey [69.45421620616486]
This work presents the first structured taxonomy and analysis of discrete tokenization methods designed for large language models (LLMs)<n>We categorize 8 representative VQ variants that span classical and modern paradigms and analyze their algorithmic principles, training dynamics, and integration challenges with LLM pipelines.<n>We identify key challenges including codebook collapse, unstable gradient estimation, and modality-specific encoding constraints.
arXiv Detail & Related papers (2025-07-21T10:52:14Z)
A Comprehensive Review on Harnessing Large Language Models to Overcome Recommender System Challenges [5.750235776275005]
Large Language Models (LLMs) can be leveraged to tackle key challenges in recommender systems.<n>LLMs enhance personalization, semantic alignment, and interpretability without requiring extensive task-specific supervision.<n>LLMs enable zero- and few-shot reasoning, allowing systems to operate effectively in cold-start and long-tail scenarios.
arXiv Detail & Related papers (2025-07-17T06:03:57Z)
Survey: Multi-Armed Bandits Meet Large Language Models [6.718566736462752]
Bandit algorithms and Large Language Models (LLMs) have emerged as powerful tools in artificial intelligence.<n>We first examine the role of bandit algorithms in optimizing LLM fine-tuning, prompt engineering, and adaptive response generation.<n>We then explore how LLMs can augment bandit algorithms through advanced contextual understanding, dynamic adaptation, and improved policy selection using natural language reasoning.
arXiv Detail & Related papers (2025-05-19T16:57:57Z)
Boost, Disentangle, and Customize: A Robust System2-to-System1 Pipeline for Code Generation [58.799397354312596]
Large language models (LLMs) have demonstrated remarkable capabilities in various domains, particularly in system 1 tasks.<n>Recent research on System2-to-System1 methods surge, exploring the System 2 reasoning knowledge via inference-time computation.<n>In this paper, we focus on code generation, which is a representative System 2 task, and identify two primary challenges.
arXiv Detail & Related papers (2025-02-18T03:20:50Z)
Large Language Model-Enhanced Multi-Armed Bandits [43.34246396804588]
Large language models (LLMs) have been adopted to solve sequential decision-making tasks such as multi-armed bandits (MAB)<n>We propose an alternative approach which combines the strengths of classical MAB and LLMs.<n>We conduct empirical evaluations using both synthetic MAB tasks and experiments designed using real-world text datasets.
arXiv Detail & Related papers (2025-02-03T07:19:05Z)
RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training [55.54020926284334]
Multimodal Large Language Models (MLLMs) have recently received substantial interest, which shows their emerging potential as general-purpose models for various vision-language tasks. Retrieval augmentation techniques have proven to be effective plugins for both LLMs and MLLMs. In this study, we propose multimodal adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training (RA-BLIP), a novel retrieval-augmented framework for various MLLMs.
arXiv Detail & Related papers (2024-10-18T03:45:19Z)
From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal Reasoning with Large Language Models [56.9134620424985]
Cross-modal reasoning (CMR) is increasingly recognized as a crucial capability in the progression toward more sophisticated artificial intelligence systems. The recent trend of deploying Large Language Models (LLMs) to tackle CMR tasks has marked a new mainstream of approaches for enhancing their effectiveness. This survey offers a nuanced exposition of current methodologies applied in CMR using LLMs, classifying these into a detailed three-tiered taxonomy.
arXiv Detail & Related papers (2024-09-19T02:51:54Z)
LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models [56.25156596019168]
This paper introduces the LMRL-Gym benchmark for evaluating multi-turn RL for large language models (LLMs) Our benchmark consists of 8 different language tasks, which require multiple rounds of language interaction and cover a range of tasks in open-ended dialogue and text games.
arXiv Detail & Related papers (2023-11-30T03:59:31Z)
A Survey on Large Language Models for Recommendation [77.91673633328148]
Large Language Models (LLMs) have emerged as powerful tools in the field of Natural Language Processing (NLP) This survey presents a taxonomy that categorizes these models into two major paradigms, respectively Discriminative LLM for Recommendation (DLLM4Rec) and Generative LLM for Recommendation (GLLM4Rec)
arXiv Detail & Related papers (2023-05-31T13:51:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.