Related papers: Scalable Oversight via Partitioned Human Supervision

Scalable Oversight via Partitioned Human Supervision

URL: http://arxiv.org/abs/2510.22500v1
Date: Sun, 26 Oct 2025 02:42:03 GMT
Title: Scalable Oversight via Partitioned Human Supervision
Authors: Ren Yin, Takashi Ishida, Masashi Sugiyama,
Abstract summary: Even the best human experts are knowledgeable only in a single narrow area.<n>Humans may provide a weak signal, i.e., a complementary label indicating an option that is incorrect.<n>We propose a scalable oversight framework that enables us to evaluate frontier AI systems without the need to prepare the ground truth.
Score: 47.001801756596926
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As artificial intelligence (AI) systems approach and surpass expert human performance across a broad range of tasks, obtaining high-quality human supervision for evaluation and training becomes increasingly challenging. Our focus is on tasks that require deep knowledge and skills of multiple domains. Unfortunately, even the best human experts are knowledgeable only in a single narrow area, and will not be able to evaluate the correctness of advanced AI systems on such superhuman tasks. However, based on their narrow expertise, humans may provide a weak signal, i.e., a complementary label indicating an option that is incorrect. For example, a cardiologist could state that "this is not related to cardiology,'' even if they cannot identify the true disease. Based on this weak signal, we propose a scalable oversight framework that enables us to evaluate frontier AI systems without the need to prepare the ground truth. We derive an unbiased estimator of top-1 accuracy from complementary labels and quantify how many complementary labels are needed to match the variance of ordinary labels. We further introduce two estimators to combine scarce ordinary labels with abundant complementary labels. We provide finite-sample deviation guarantees for both complementary-only and the mixed estimators. Empirically, we show that we can evaluate the output of large language models without the ground truth, if we have complementary labels. We further show that we can train an AI system with such weak signals: we show how we can design an agentic AI system automatically that can perform better with this partitioned human supervision. Our code is available at https://github.com/R-Yin-217/Scalable-Oversight-via-Human-Partitioned-Supervision.

Related papers

Explainable AI for Collaborative Assessment of 2D/3D Registration Quality [50.65650507103078]
We propose the first artificial intelligence framework trained specifically for 2D/3D registration quality verification.<n>Our explainable AI (XAI) approach aims to enhance informed decision-making for human operators.
arXiv Detail & Related papers (2025-07-23T15:28:57Z)
Beyond Black-Box AI: Interpretable Hybrid Systems for Dementia Care [2.4339626079536925]
The recent boom of large language models (LLMs) has re-ignited the hope that artificial intelligence (AI) systems could aid medical diagnosis.<n>Despite dazzling benchmark scores, LLM assistants have yet to deliver measurable improvements at the bedside.<n>This scoping review aims to highlight the areas where AI is limited to make practical contributions in the clinical setting.
arXiv Detail & Related papers (2025-07-02T01:43:06Z)
Image Quality Assessment for Embodied AI [103.66095742463195]
Embodied AI has developed rapidly in recent years, but it is still mainly deployed in laboratories.<n>There is no IQA method to assess the usability of an image in embodied tasks, namely, the perceptual quality for robots.
arXiv Detail & Related papers (2025-05-22T15:51:07Z)
On the Interplay of Human-AI Alignment,Fairness, and Performance Trade-offs in Medical Imaging [3.054669417364281]
We provide the first systematic exploration of Human-AI alignment and fairness in this domain.<n>Our results show that incorporating human insights consistently reduces fairness gaps and enhances out-of-domain generalization.<n>These findings highlight Human-AI alignment as a promising approach for developing fair, robust, and generalizable medical AI systems.
arXiv Detail & Related papers (2025-05-15T12:43:23Z)
Neurodivergent Influenceability as a Contingent Solution to the AI Alignment Problem [1.3905735045377272]
The AI alignment problem, which focusses on ensuring that artificial intelligence (AI) systems act according to human values, presents profound challenges.<n>With the progression from narrow AI to Artificial General Intelligence (AGI) and Superintelligence, fears about control and existential risk have escalated.<n>Here, we investigate whether embracing inevitable AI misalignment can be a contingent strategy to foster a dynamic ecosystem of competing agents.
arXiv Detail & Related papers (2025-05-05T11:33:18Z)
Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision [98.97575836717931]
Current AI alignment methodologies rely on human-provided demonstrations or judgments.<n>This raises a challenging research question: How can we keep improving the systems when their capabilities have surpassed the levels of humans?
arXiv Detail & Related papers (2024-03-14T15:12:38Z)
Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision [84.31474052176343]
Recent AI-assistant agents, such as ChatGPT, rely on supervised fine-tuning (SFT) with human annotations and reinforcement learning from human feedback to align the output with human intentions. This dependence can significantly constrain the true potential of AI-assistant agents due to the high cost of obtaining human supervision. We propose a novel approach called SELF-ALIGN, which combines principle-driven reasoning and the generative power of LLMs for the self-alignment of AI agents with minimal human supervision.
arXiv Detail & Related papers (2023-05-04T17:59:28Z)
Human Uncertainty in Concept-Based AI Systems [37.82747673914624]
We study human uncertainty in the context of concept-based AI systems. We show that training with uncertain concept labels may help mitigate weaknesses in concept-based systems.
arXiv Detail & Related papers (2023-03-22T19:17:57Z)
Can Machines Imitate Humans? Integrative Turing-like tests for Language and Vision Demonstrate a Narrowing Gap [56.611702960809644]
We benchmark AI's ability to imitate humans in three language tasks and three vision tasks.<n>Next, we conducted 72,191 Turing-like tests with 1,916 human judges and 10 AI judges.<n>Imitation ability showed minimal correlation with conventional AI performance metrics.
arXiv Detail & Related papers (2022-11-23T16:16:52Z)
A Human-Centric Assessment Framework for AI [11.065260433086024]
There is no agreed standard on how explainable AI systems should be assessed. Inspired by the Turing test, we introduce a human-centric assessment framework. This setup can serve as framework for a wide range of human-centric AI system assessments.
arXiv Detail & Related papers (2022-05-25T12:59:13Z)
Learning to Complement Humans [67.38348247794949]
A rising vision for AI in the open world centers on the development of systems that can complement humans for perceptual, diagnostic, and reasoning tasks. We demonstrate how an end-to-end learning strategy can be harnessed to optimize the combined performance of human-machine teams.
arXiv Detail & Related papers (2020-05-01T20:00:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.