Related papers: Large language models require a new form of oversight: capability-based monitoring

Large language models require a new form of oversight: capability-based monitoring

URL: http://arxiv.org/abs/2511.03106v1
Date: Wed, 05 Nov 2025 01:20:28 GMT
Title: Large language models require a new form of oversight: capability-based monitoring
Authors: Katherine C. Kellogg, Bingyang Ye, Yifan Hu, Guergana K. Savova, Byron Wallace, Danielle S. Bitterman,
Abstract summary: Large language models (LLMs) in healthcare have been accompanied by scrutiny of their oversight.<n>We propose a new organizing principle guiding generalist LLM monitoring that is scalable and grounded in how these models are developed and used in practice: capability-based monitoring.<n>We describe considerations for developers, organizational leaders, and professional societies for implementing a capability-based monitoring approach.
Score: 10.382163755118713
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The rapid adoption of large language models (LLMs) in healthcare has been accompanied by scrutiny of their oversight. Existing monitoring approaches, inherited from traditional machine learning (ML), are task-based and founded on assumed performance degradation arising from dataset drift. In contrast, with LLMs, inevitable model degradation due to changes in populations compared to the training dataset cannot be assumed, because LLMs were not trained for any specific task in any given population. We therefore propose a new organizing principle guiding generalist LLM monitoring that is scalable and grounded in how these models are developed and used in practice: capability-based monitoring. Capability-based monitoring is motivated by the fact that LLMs are generalist systems whose overlapping internal capabilities are reused across numerous downstream tasks. Instead of evaluating each downstream task independently, this approach organizes monitoring around shared model capabilities, such as summarization, reasoning, translation, or safety guardrails, in order to enable cross-task detection of systemic weaknesses, long-tail errors, and emergent behaviors that task-based monitoring may miss. We describe considerations for developers, organizational leaders, and professional societies for implementing a capability-based monitoring approach. Ultimately, capability-based monitoring will provide a scalable foundation for safe, adaptive, and collaborative monitoring of LLMs and future generalist artificial intelligence models in healthcare.

Related papers

LLM-Assisted Logic Rule Learning: Scaling Human Expertise for Time Series Anomaly Detection [0.9740025522928777]
Time series anomaly detection is critical for supply chain management to take proactive operations.<n>We propose a framework that leverages large language models (LLMs) to systematically encode human expertise into interpretable, logic-based rules.
arXiv Detail & Related papers (2026-01-27T06:37:37Z)
LLM Performance Predictors: Learning When to Escalate in Hybrid Human-AI Moderation Systems [5.7001352660257005]
We propose a framework for supervised uncertainty quantification in content moderation systems.<n>We show that our method enables cost-aware selective classification in real-world human-AI.<n>This work establishes a principled framework for uncertainty-aware, scalable and responsible human-AI moderation.
arXiv Detail & Related papers (2026-01-11T17:46:49Z)
AutoMonitor-Bench: Evaluating the Reliability of LLM-Based Misbehavior Monitor [19.39430341586964]
AutoMonitor-Bench is the first benchmark to systematically evaluate the reliability of misbehavior monitors across diverse tasks and failure modes.<n>We evaluate monitors using two complementary metrics: Miss Rate (MR) and False Alarm Rate (FAR), capturing failures to detect misbehavior and oversensitivity to benign behavior, respectively.<n>We construct a large-scale training corpus of 153,581 samples and fine-tune Qwen3-4B-Instruction to investigate whether training on known, relatively easy-to-construct misbehavior datasets improves monitoring performance on unseen and more implicit misbehaviors.
arXiv Detail & Related papers (2026-01-09T12:09:45Z)
SelfAI: Building a Self-Training AI System with LLM Agents [79.10991818561907]
SelfAI is a general multi-agent platform that combines a User Agent for translating high-level research objectives into standardized experimental configurations.<n>An Experiment Manager orchestrates parallel, fault-tolerant training across heterogeneous hardware while maintaining a structured knowledge base for continuous feedback.<n>Across regression, computer vision, scientific computing, medical imaging, and drug discovery benchmarks, SelfAI consistently achieves strong performance and reduces redundant trials.
arXiv Detail & Related papers (2025-11-29T09:18:39Z)
Agentic Reinforced Policy Optimization [66.96989268893932]
Large-scale reinforcement learning with verifiable rewards (RLVR) has demonstrated its effectiveness in harnessing the potential of large language models (LLMs) for single-turn reasoning tasks.<n>Current RL algorithms inadequately balance the models' intrinsic long-horizon reasoning capabilities and their proficiency in multi-turn tool interactions.<n>We propose Agentic Reinforced Policy Optimization (ARPO), a novel agentic RL algorithm tailored for training multi-turn LLM-based agents.
arXiv Detail & Related papers (2025-07-26T07:53:11Z)
Small LLMs Do Not Learn a Generalizable Theory of Mind via Reinforcement Learning [1.6114012813668932]
Small language models (LLMs) struggle to develop a generic Theory of Mind (ToM) capability.<n> prolonged RL training leads to models hacking'' the statistical patterns of the training datasets.<n>This suggests the learned behavior is a form of narrow overfitting rather than the acquisition of a true, abstract ToM capability.
arXiv Detail & Related papers (2025-07-21T16:47:59Z)
Improving LLM Reasoning for Vulnerability Detection via Group Relative Policy Optimization [45.799380822683034]
We present an extensive study aimed at advancing RL-based finetuning techniques for Large Language Models (LLMs)<n>We highlight key limitations of commonly adopted LLMs, such as their tendency to over-predict certain types of vulnerabilities while failing to detect others.<n>To address this challenge, we explore the use of Group Relative Policy Optimization (GRPO), a recent policy-gradient method, for guiding LLM behavior through structured, rule-based rewards.
arXiv Detail & Related papers (2025-07-03T11:52:45Z)
Weakly-supervised VLM-guided Partial Contrastive Learning for Visual Language Navigation [36.17444261325021]
Visual Language Navigation (VLN) is a fundamental task within the field of Embodied AI, focusing on the ability of agents to navigate complex environments based on natural language instructions.<n>Existing methods rely on pre-trained backbone models for visual perception, which struggle with the dynamic viewpoints in VLN scenarios.<n>We propose Weakly-supervised Partial Contrastive Learning (WPCL), a method that enhances an agent's ability to identify objects from dynamic viewpoints in VLN scenarios without requiring VLM fine-tuning.
arXiv Detail & Related papers (2025-06-18T11:43:50Z)
Zero-Shot Whole-Body Humanoid Control via Behavioral Foundation Models [71.34520793462069]
Unsupervised reinforcement learning (RL) aims at pre-training agents that can solve a wide range of downstream tasks in complex environments.<n>We introduce a novel algorithm regularizing unsupervised RL towards imitating trajectories from unlabeled behavior datasets.<n>We demonstrate the effectiveness of this new approach in a challenging humanoid control problem.
arXiv Detail & Related papers (2025-04-15T10:41:11Z)
Beyond External Monitors: Enhancing Transparency of Large Language Models for Easier Monitoring [18.837335987273256]
Large language models (LLMs) are becoming increasingly capable, but the mechanisms of their thinking and decision-making process remain unclear.<n>We propose a novel method TELLME, improving the transparency of LLMs and helping monitors identify unsuitable and sensitive behaviors.
arXiv Detail & Related papers (2025-02-07T13:25:33Z)
Coalitions of Large Language Models Increase the Robustness of AI Agents [3.216132991084434]
Large Language Models (LLMs) have fundamentally altered the way we interact with digital systems. LLMs are powerful and capable of demonstrating some emergent properties, but struggle to perform well at all sub-tasks carried out by an AI agent. We assess if a system comprising of a coalition of pretrained LLMs, each exhibiting specialised performance at individual sub-tasks, can match the performance of single model agents.
arXiv Detail & Related papers (2024-08-02T16:37:44Z)
Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning [61.2224355547598]
Open-sourcing of large language models (LLMs) accelerates application development, innovation, and scientific progress. Our investigation exposes a critical oversight in this belief. By deploying carefully designed demonstrations, our research demonstrates that base LLMs could effectively interpret and execute malicious instructions.
arXiv Detail & Related papers (2024-04-16T13:22:54Z)
Empowering Autonomous Driving with Large Language Models: A Safety Perspective [82.90376711290808]
This paper explores the integration of Large Language Models (LLMs) into Autonomous Driving systems. LLMs are intelligent decision-makers in behavioral planning, augmented with a safety verifier shield for contextual safety learning. We present two key studies in a simulated environment: an adaptive LLM-conditioned Model Predictive Control (MPC) and an LLM-enabled interactive behavior planning scheme with a state machine.
arXiv Detail & Related papers (2023-11-28T03:13:09Z)
Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models [61.28463542324576]
Vision-language models (VLMs) have recently demonstrated strong efficacy as visual assistants that can generate human-like outputs. We evaluate existing state-of-the-art VLMs and find that even the best-performing model is unable to demonstrate strong visual reasoning capabilities and consistency. We propose a two-stage training framework aimed at improving both the reasoning performance and consistency of VLMs.
arXiv Detail & Related papers (2023-09-08T17:49:44Z)
Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment. We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent. We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.