Reward Reports for Reinforcement Learning
- URL: http://arxiv.org/abs/2204.10817v3
- Date: Mon, 20 Mar 2023 03:39:51 GMT
- Title: Reward Reports for Reinforcement Learning
- Authors: Thomas Krendl Gilbert, Nathan Lambert, Sarah Dean, Tom Zick and Aaron
Snoswell
- Abstract summary: We sketch a framework for documenting deployed and iteratively updated learning systems, which we call Reward Reports.
Taking inspiration from various contributions to the technical literature on reinforcement learning, we outline Reward Reports as living documents that track updates to design choices and assumptions behind what a particular automated system is optimizing for.
- Score: 3.7568608766189597
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Building systems that are good for society in the face of complex societal
effects requires a dynamic approach. Recent approaches to machine learning (ML)
documentation have demonstrated the promise of discursive frameworks for
deliberation about these complexities. However, these developments have been
grounded in a static ML paradigm, leaving the role of feedback and
post-deployment performance unexamined. Meanwhile, recent work in reinforcement
learning has shown that the effects of feedback and optimization objectives on
system behavior can be wide-ranging and unpredictable. In this paper we sketch
a framework for documenting deployed and iteratively updated learning systems,
which we call Reward Reports. Taking inspiration from various contributions to
the technical literature on reinforcement learning, we outline Reward Reports
as living documents that track updates to design choices and assumptions behind
what a particular automated system is optimizing for. They are intended to
track dynamic phenomena arising from system deployment, rather than merely
static properties of models or data. After presenting the elements of a Reward
Report, we discuss a concrete example: Meta's BlenderBot 3 chatbot. Several
others for game-playing (DeepMind's MuZero), content recommendation
(MovieLens), and traffic control (Project Flow) are included in the appendix.
Related papers
- SoliReward: Mitigating Susceptibility to Reward Hacking and Annotation Noise in Video Generation Reward Models [53.19726629537694]
Post-training alignment of video generation models with human preferences is a critical goal.<n>Current data collection paradigms, reliant on in-prompt pairwise annotations, suffer from labeling noise.<n>We propose SoliReward, a systematic framework for video RM training.
arXiv Detail & Related papers (2025-12-17T14:28:23Z) - Multimodal Peer Review Simulation with Actionable To-Do Recommendations for Community-Aware Manuscript Revisions [16.556181117253473]
We present an interactive web-based system for multimodal, community-aware peer review simulation to enable effective manuscript revisions before paper submission.<n>Our framework integrates textual and visual information through multimodal LLMs, enhances review quality via retrieval-augmented generation (RAG) grounded in web-scale OpenReview data.<n>The system integrates seamlessly into existing academic writing platforms, providing interactive interfaces for real-time feedback and revision tracking.
arXiv Detail & Related papers (2025-11-14T02:29:23Z) - Sample-Efficient Online Learning in LM Agents via Hindsight Trajectory Rewriting [92.57796055887995]
We introduce ECHO, a prompting framework that adapts hindsight experience replay from reinforcement learning for language model agents.<n> ECHO generates optimized trajectories for alternative goals that could have been achieved during failed attempts.<n>We evaluate ECHO on stateful versions of XMiniGrid, a text-based navigation and planning benchmark, and PeopleJoinQA, a collaborative information-gathering enterprise simulation.
arXiv Detail & Related papers (2025-10-11T18:11:09Z) - Bridging Collaborative Filtering and Large Language Models with Dynamic Alignment, Multimodal Fusion and Evidence-grounded Explanations [1.3702600718499687]
We develop an online adaptation mechanism that incorporates new user interactions through lightweight modules.<n>We create a unified representation that seamlessly combines collaborative signals with visual and audio features.<n>Our approach maintains the efficiency of frozen base models while adding minimal computational overhead, making it practical for real-world deployment.
arXiv Detail & Related papers (2025-10-02T02:43:24Z) - Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency [56.475612147721264]
We propose a dual-reward formulation that supervises both semantic and temporal reasoning through discrete and continuous reward signals.<n>We evaluate our approach across eight representative video understanding tasks, including VideoQA, Temporal Video Grounding, and Grounded VideoQA.<n>Results underscore the importance of reward design and data selection in advancing reasoning-centric video understanding with MLLMs.
arXiv Detail & Related papers (2025-06-02T17:28:26Z) - Training Plug-n-Play Knowledge Modules with Deep Context Distillation [52.94830874557649]
In this paper, we propose a way of modularizing knowledge by training document-level Knowledge Modules (KMs)
KMs are lightweight components implemented as parameter-efficient LoRA modules, which are trained to store information about new documents.
Our method outperforms standard next-token prediction and pre-instruction training techniques, across two datasets.
arXiv Detail & Related papers (2025-03-11T01:07:57Z) - Large Language Models as Realistic Microservice Trace Generators [54.85489678342595]
Workload traces are essential to understand complex computer systems' behavior and manage processing and memory resources.
This paper proposes a first-of-a-kind approach that relies on training a large language model to generate synthetic workload traces.
Our model adapts to downstream trace-related tasks, such as predicting key trace features and infilling missing data.
arXiv Detail & Related papers (2024-12-16T12:48:04Z) - Beyond Content Relevance: Evaluating Instruction Following in Retrieval Models [25.301280441283147]
This study evaluates the instruction-following capabilities of various retrieval models beyond content relevance.
We develop a novel retrieval evaluation benchmark spanning six document-level attributes.
Our findings indicate that although fine-tuning models on instruction-aware retrieval datasets enhance performance, most models still fall short of instruction compliance.
arXiv Detail & Related papers (2024-10-31T11:47:21Z) - Flex: End-to-End Text-Instructed Visual Navigation with Foundation Models [59.892436892964376]
We investigate the minimal data requirements and architectural adaptations necessary to achieve robust closed-loop performance with vision-based control policies.
Our findings are synthesized in Flex (Fly-lexically), a framework that uses pre-trained Vision Language Models (VLMs) as frozen patch-wise feature extractors.
We demonstrate the effectiveness of this approach on quadrotor fly-to-target tasks, where agents trained via behavior cloning successfully generalize to real-world scenes.
arXiv Detail & Related papers (2024-10-16T19:59:31Z) - Enhanced Transformer architecture for in-context learning of dynamical systems [0.3749861135832073]
In this paper, we enhance the original meta-modeling framework through three key innovations.
The efficacy of these modifications is demonstrated through a numerical example focusing on the Wiener-Hammerstein system class.
arXiv Detail & Related papers (2024-10-04T10:05:15Z) - Making Text Embedders Few-Shot Learners [33.50993377494602]
We introduce a novel model bge-en-icl, which employs few-shot examples to produce high-quality text embeddings.
Our approach integrates task-related examples directly into the query side, resulting in significant improvements across various tasks.
Experimental results on the MTEB and AIR-Bench benchmarks demonstrate that our approach sets new state-of-the-art (SOTA) performance.
arXiv Detail & Related papers (2024-09-24T03:30:19Z) - Any-point Trajectory Modeling for Policy Learning [64.23861308947852]
We introduce Any-point Trajectory Modeling (ATM) to predict future trajectories of arbitrary points within a video frame.
ATM outperforms strong video pre-training baselines by 80% on average.
We show effective transfer learning of manipulation skills from human videos and videos from a different robot morphology.
arXiv Detail & Related papers (2023-12-28T23:34:43Z) - Relax: Composable Abstractions for End-to-End Dynamic Machine Learning [19.79913796167022]
We present Relax, a compiler abstraction for optimizing end-to-end dynamic machine learning workloads.
Relax introduces first-class symbolic shape annotations to track dynamic shape computations globally across the program.
We build an end-to-end compilation framework using the proposed approach to optimize dynamic shape models.
arXiv Detail & Related papers (2023-11-01T23:03:59Z) - Multi-View Class Incremental Learning [57.14644913531313]
Multi-view learning (MVL) has gained great success in integrating information from multiple perspectives of a dataset to improve downstream task performance.
This paper investigates a novel paradigm called multi-view class incremental learning (MVCIL), where a single model incrementally classifies new classes from a continual stream of views.
arXiv Detail & Related papers (2023-06-16T08:13:41Z) - Modular Deep Learning [120.36599591042908]
Transfer learning has recently become the dominant paradigm of machine learning.
It remains unclear how to develop models that specialise towards multiple tasks without incurring negative interference.
Modular deep learning has emerged as a promising solution to these challenges.
arXiv Detail & Related papers (2023-02-22T18:11:25Z) - Improving Meta-learning for Low-resource Text Classification and
Generation via Memory Imitation [87.98063273826702]
We propose a memory imitation meta-learning (MemIML) method that enhances the model's reliance on support sets for task adaptation.
A theoretical analysis is provided to prove the effectiveness of our method.
arXiv Detail & Related papers (2022-03-22T12:41:55Z) - Reinforcement Learning based Path Exploration for Sequential Explainable
Recommendation [57.67616822888859]
We propose a novel Temporal Meta-path Guided Explainable Recommendation leveraging Reinforcement Learning (TMER-RL)
TMER-RL utilizes reinforcement item-item path modelling between consecutive items with attention mechanisms to sequentially model dynamic user-item evolutions on dynamic knowledge graph for explainable recommendation.
Extensive evaluations of TMER on two real-world datasets show state-of-the-art performance compared against recent strong baselines.
arXiv Detail & Related papers (2021-11-24T04:34:26Z) - OPAD: An Optimized Policy-based Active Learning Framework for Document
Content Analysis [6.159771892460152]
We propose textitOPAD, a novel framework using reinforcement policy for active learning in content detection tasks for documents.
The framework learns the acquisition function to decide the samples to be selected while optimizing performance metrics.
We show superior performance of the proposed textitOPAD framework for active learning for various tasks related to document understanding.
arXiv Detail & Related papers (2021-10-01T07:40:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.