LLMs-augmented Contextual Bandit
- URL: http://arxiv.org/abs/2311.02268v1
- Date: Fri, 3 Nov 2023 23:12:57 GMT
- Title: LLMs-augmented Contextual Bandit
- Authors: Ali Baheri, Cecilia O. Alm
- Abstract summary: We propose a novel integration of large language models (LLMs) with the contextual bandit framework.
Preliminary results on synthetic datasets demonstrate the potential of this approach.
- Score: 7.578368459974475
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Contextual bandits have emerged as a cornerstone in reinforcement learning,
enabling systems to make decisions with partial feedback. However, as contexts
grow in complexity, traditional bandit algorithms can face challenges in
adequately capturing and utilizing such contexts. In this paper, we propose a
novel integration of large language models (LLMs) with the contextual bandit
framework. By leveraging LLMs as an encoder, we enrich the representation of
the context, providing the bandit with a denser and more informative view.
Preliminary results on synthetic datasets demonstrate the potential of this
approach, showing notable improvements in cumulative rewards and reductions in
regret compared to traditional bandit algorithms. This integration not only
showcases the capabilities of LLMs in reinforcement learning but also opens the
door to a new era of contextually-aware decision systems.
Related papers
- Teaching Models to Improve on Tape [30.330699770714165]
Large Language Models (LLMs) often struggle when prompted to generate content under specific constraints.
Recent works have shown that LLMs can benefit from such "corrective feedback"
We introduce an RL framework for teaching models to use such rewards, by simulating interaction sessions, and rewarding the model according to its ability to satisfy the constraints.
arXiv Detail & Related papers (2024-11-03T08:49:55Z) - Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval [23.94611751368491]
We investigate the feasibility of leveraging large language models (LLMs) for integrating general knowledge and incorporating pseudo-events as priors for temporal content distribution.
To overcome these limitations, we propose utilizing LLM encoders instead of decoders.
We present a general framework for integrating LLM encoders into existing VMR architectures, specifically within the fusion module.
arXiv Detail & Related papers (2024-07-21T04:39:06Z) - Rethinking Visual Prompting for Multimodal Large Language Models with External Knowledge [76.45868419402265]
multimodal large language models (MLLMs) have made significant strides by training on vast high-quality image-text datasets.
However, the inherent difficulty in explicitly conveying fine-grained or spatially dense information in text, such as masks, poses a challenge for MLLMs.
This paper proposes a new visual prompt approach to integrate fine-grained external knowledge, gleaned from specialized vision models, into MLLMs.
arXiv Detail & Related papers (2024-07-05T17:43:30Z) - Jump Starting Bandits with LLM-Generated Prior Knowledge [5.344012058238259]
We show that Large Language Models can jump-start contextual multi-armed bandits to reduce online learning regret.
We propose an algorithm for contextual bandits by prompting LLMs to produce a pre-training dataset of approximate human preferences for the bandit.
arXiv Detail & Related papers (2024-06-27T16:52:19Z) - Sparsity-Guided Holistic Explanation for LLMs with Interpretable
Inference-Time Intervention [53.896974148579346]
Large Language Models (LLMs) have achieved unprecedented breakthroughs in various natural language processing domains.
The enigmatic black-box'' nature of LLMs remains a significant challenge for interpretability, hampering transparent and accountable applications.
We propose a novel methodology anchored in sparsity-guided techniques, aiming to provide a holistic interpretation of LLMs.
arXiv Detail & Related papers (2023-12-22T19:55:58Z) - Improving Factual Consistency of Text Summarization by Adversarially
Decoupling Comprehension and Embellishment Abilities of LLMs [67.56087611675606]
Large language models (LLMs) generate summaries that are factually inconsistent with original articles.
These hallucinations are challenging to detect through traditional methods.
We propose an adversarially DEcoupling method to disentangle the abilities of LLMs (DECENT)
arXiv Detail & Related papers (2023-10-30T08:40:16Z) - Unified Risk Analysis for Weakly Supervised Learning [65.75775694815172]
We introduce a framework providing a comprehensive understanding and a unified methodology for weakly supervised learning.
The formulation component of the framework, leveraging a contamination perspective, provides a unified interpretation of how weak supervision is formed.
The analysis component of the framework, viewed as a decontamination process, provides a systematic method of conducting risk rewrite.
arXiv Detail & Related papers (2023-09-15T07:30:15Z) - Zero-Shot Video Moment Retrieval from Frozen Vision-Language Models [58.17315970207874]
We propose a zero-shot method for adapting generalisable visual-textual priors from arbitrary VLM to facilitate moment-text alignment.
Experiments conducted on three VMR benchmark datasets demonstrate the notable performance advantages of our zero-shot algorithm.
arXiv Detail & Related papers (2023-09-01T13:06:50Z) - Practical Contextual Bandits with Feedback Graphs [44.76976254893256]
We propose and analyze an approach to contextual bandits with feedback graphs based upon reduction to regression.
The resulting algorithms are computationally practical and achieve established minimax rates.
arXiv Detail & Related papers (2023-02-17T00:06:42Z) - Instance-Dependent Complexity of Contextual Bandits and Reinforcement
Learning: A Disagreement-Based Perspective [104.67295710363679]
In the classical multi-armed bandit problem, instance-dependent algorithms attain improved performance on "easy" problems with a gap between the best and second-best arm.
We introduce a family of complexity measures that are both sufficient and necessary to obtain instance-dependent regret bounds.
We then introduce new oracle-efficient algorithms which adapt to the gap whenever possible, while also attaining the minimax rate in the worst case.
arXiv Detail & Related papers (2020-10-07T01:33:06Z) - Stochastic Linear Contextual Bandits with Diverse Contexts [17.35270010828849]
We show that when contexts are sufficiently diverse, the learner is able to utilize the information obtained during exploitation to shorten the exploration process.
We design the LinUCB-d algorithm, and propose a novel approach to analyze its regret performance.
arXiv Detail & Related papers (2020-03-05T14:51:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.