Upstream and Downstream AI Safety: Both on the Same River?
- URL: http://arxiv.org/abs/2501.05455v1
- Date: Mon, 09 Dec 2024 23:33:31 GMT
- Title: Upstream and Downstream AI Safety: Both on the Same River?
- Authors: John McDermid, Yan Jia, Ibrahim Habli,
- Abstract summary: Traditional safety engineering assesses systems in their context of use, e.g. the operational design domain for self-driving vehicles.<n>In contrast, work on safety of frontier AI, e.g. large language models which can be further trained for downstream tasks, typically considers factors that are beyond specific application contexts.<n>We outline the characteristics of both upstream and downstream safety frameworks then explore the extent to which the broad AI safety community can benefit from synergies between these frameworks.
- Score: 2.568305911419446
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Traditional safety engineering assesses systems in their context of use, e.g. the operational design domain (road layout, speed limits, weather, etc.) for self-driving vehicles (including those using AI). We refer to this as downstream safety. In contrast, work on safety of frontier AI, e.g. large language models which can be further trained for downstream tasks, typically considers factors that are beyond specific application contexts, such as the ability of the model to evade human control, or to produce harmful content, e.g. how to make bombs. We refer to this as upstream safety. We outline the characteristics of both upstream and downstream safety frameworks then explore the extent to which the broad AI safety community can benefit from synergies between these frameworks. For example, can concepts such as common mode failures from downstream safety be used to help assess the strength of AI guardrails? Further, can the understanding of the capabilities and limitations of frontier AI be used to inform downstream safety analysis, e.g. where LLMs are fine-tuned to calculate voyage plans for autonomous vessels? The paper identifies some promising avenues to explore and outlines some challenges in achieving synergy, or a confluence, between upstream and downstream safety frameworks.
Related papers
- The BIG Argument for AI Safety Cases [4.0675753909100445]
The BIG argument adopts a whole-system approach to constructing a safety case for AI systems of varying capability, autonomy and criticality.
It is balanced by addressing safety alongside other critical ethical issues such as privacy and equity.
It is integrated by bringing together the social, ethical and technical aspects of safety assurance in a way that is traceable and accountable.
arXiv Detail & Related papers (2025-03-12T11:33:28Z) - SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models [63.71984266104757]
Multimodal Large Language Models (MLLMs) can process both visual and textual data.<n>We propose SafeAuto, a novel framework that enhances MLLM-based autonomous driving systems by incorporating both unstructured and structured knowledge.
arXiv Detail & Related papers (2025-02-28T21:53:47Z) - Safety cases for frontier AI [0.8987776881291144]
Safety cases are reports that make a structured argument, supported by evidence, that a system is safe enough in a given operational context.
Safety cases are already common in other safety-critical industries such as aviation and nuclear power.
We explain why they may also be a useful tool in frontier AI governance, both in industry self-regulation and government regulation.
arXiv Detail & Related papers (2024-10-28T22:08:28Z) - SafetyAnalyst: Interpretable, transparent, and steerable safety moderation for AI behavior [56.10557932893919]
We present SafetyAnalyst, a novel AI safety moderation framework.<n>Given an AI behavior, SafetyAnalyst uses chain-of-thought reasoning to analyze its potential consequences.<n>It aggregates all harmful and beneficial effects into a harmfulness score using fully interpretable weight parameters.
arXiv Detail & Related papers (2024-10-22T03:38:37Z) - Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress? [59.96471873997733]
We propose an empirical foundation for developing more meaningful safety metrics and define AI safety in a machine learning research context.<n>We aim to provide a more rigorous framework for AI safety research, advancing the science of safety evaluations and clarifying the path towards measurable progress.
arXiv Detail & Related papers (2024-07-31T17:59:24Z) - Work-in-Progress: Crash Course: Can (Under Attack) Autonomous Driving Beat Human Drivers? [60.51287814584477]
This paper evaluates the inherent risks in autonomous driving by examining the current landscape of AVs.
We develop specific claims highlighting the delicate balance between the advantages of AVs and potential security challenges in real-world scenarios.
arXiv Detail & Related papers (2024-05-14T09:42:21Z) - Safety Analysis of Autonomous Railway Systems: An Introduction to the SACRED Methodology [2.47737926497181]
We introduce SACRED, a safety methodology for producing an initial safety case for autonomous systems.
The development of SACRED is motivated by the proposed GoA-4 light-rail system in Berlin.
arXiv Detail & Related papers (2024-03-18T11:12:19Z) - Safety-Enhanced Autonomous Driving Using Interpretable Sensor Fusion
Transformer [28.15612357340141]
We propose a safety-enhanced autonomous driving framework, named Interpretable Sensor Fusion Transformer(InterFuser)
We process and fuse information from multi-modal multi-view sensors for achieving comprehensive scene understanding and adversarial event detection.
Our framework provides more semantics and are exploited to better constrain actions to be within the safe sets.
arXiv Detail & Related papers (2022-07-28T11:36:21Z) - Inspect, Understand, Overcome: A Survey of Practical Methods for AI
Safety [54.478842696269304]
The use of deep neural networks (DNNs) in safety-critical applications is challenging due to numerous model-inherent shortcomings.
In recent years, a zoo of state-of-the-art techniques aiming to address these safety concerns has emerged.
Our paper addresses both machine learning experts and safety engineers.
arXiv Detail & Related papers (2021-04-29T09:54:54Z) - AdvSim: Generating Safety-Critical Scenarios for Self-Driving Vehicles [76.46575807165729]
We propose AdvSim, an adversarial framework to generate safety-critical scenarios for any LiDAR-based autonomy system.
By simulating directly from sensor data, we obtain adversarial scenarios that are safety-critical for the full autonomy stack.
arXiv Detail & Related papers (2021-01-16T23:23:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.