Mapping Technical Safety Research at AI Companies: A literature review and incentives analysis
- URL: http://arxiv.org/abs/2409.07878v2
- Date: Wed, 25 Sep 2024 16:19:25 GMT
- Title: Mapping Technical Safety Research at AI Companies: A literature review and incentives analysis
- Authors: Oscar Delaney, Oliver Guest, Zoe Williams,
- Abstract summary: Report analyzes the technical research into safe AI development being conducted by three leading AI companies.
Anthropic, Google DeepMind, and OpenAI.
We defined safe AI development as developing AI systems that are unlikely to pose large-scale misuse or accident risks.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As AI systems become more advanced, concerns about large-scale risks from misuse or accidents have grown. This report analyzes the technical research into safe AI development being conducted by three leading AI companies: Anthropic, Google DeepMind, and OpenAI. We define safe AI development as developing AI systems that are unlikely to pose large-scale misuse or accident risks. This encompasses a range of technical approaches aimed at ensuring AI systems behave as intended and do not cause unintended harm, even as they are made more capable and autonomous. We analyzed all papers published by the three companies from January 2022 to July 2024 that were relevant to safe AI development, and categorized the 80 included papers into nine safety approaches. Additionally, we noted two categories representing nascent approaches explored by academia and civil society, but not currently represented in any research papers by these leading AI companies. Our analysis reveals where corporate attention is concentrated and where potential gaps lie. Some AI research may stay unpublished for good reasons, such as to not inform adversaries about the details of security techniques they would need to overcome to misuse AI systems. Therefore, we also considered the incentives that AI companies have to research each approach, regardless of how much work they have published on the topic. We identified three categories where there are currently no or few papers and where we do not expect AI companies to become much more incentivized to pursue this research in the future. These are model organisms of misalignment, multi-agent safety, and safety by design. Our findings provide an indication that these approaches may be slow to progress without funding or efforts from government, civil society, philanthropists, or academia.
Related papers
- Towards evaluations-based safety cases for AI scheming [37.399946932069746]
We propose three arguments that safety cases could use in relation to scheming.
First, developers of frontier AI systems could argue that AI systems are not capable of scheming.
Second, one could argue that AI systems are not capable of posing harm through scheming.
Third, one could argue that control measures around the AI systems would prevent unacceptable outcomes even if the AI systems intentionally attempted to subvert them.
arXiv Detail & Related papers (2024-10-29T17:55:29Z) - A Survey on Offensive AI Within Cybersecurity [1.8206461789819075]
This survey paper on offensive AI will comprehensively cover various aspects related to attacks against and using AI systems.
It will delve into the impact of offensive AI practices on different domains, including consumer, enterprise, and public digital infrastructure.
The paper will explore adversarial machine learning, attacks against AI models, infrastructure, and interfaces, along with offensive techniques like information gathering, social engineering, and weaponized AI.
arXiv Detail & Related papers (2024-09-26T17:36:22Z) - The Narrow Depth and Breadth of Corporate Responsible AI Research [3.364518262921329]
We show that the majority of AI firms show limited or no engagement in this critical subfield of AI.
Leading AI firms exhibit significantly lower output in responsible AI research compared to their conventional AI research.
Our results highlight the urgent need for industry to publicly engage in responsible AI research.
arXiv Detail & Related papers (2024-05-20T17:26:43Z) - Work-in-Progress: Crash Course: Can (Under Attack) Autonomous Driving Beat Human Drivers? [60.51287814584477]
This paper evaluates the inherent risks in autonomous driving by examining the current landscape of AVs.
We develop specific claims highlighting the delicate balance between the advantages of AVs and potential security challenges in real-world scenarios.
arXiv Detail & Related papers (2024-05-14T09:42:21Z) - Near to Mid-term Risks and Opportunities of Open-Source Generative AI [94.06233419171016]
Applications of Generative AI are expected to revolutionize a number of different areas, ranging from science & medicine to education.
The potential for these seismic changes has triggered a lively debate about potential risks and resulted in calls for tighter regulation.
This regulation is likely to put at risk the budding field of open-source Generative AI.
arXiv Detail & Related papers (2024-04-25T21:14:24Z) - Particip-AI: A Democratic Surveying Framework for Anticipating Future AI Use Cases, Harms and Benefits [54.648819983899614]
General purpose AI seems to have lowered the barriers for the public to use AI and harness its power.
We introduce PARTICIP-AI, a framework for laypeople to speculate and assess AI use cases and their impacts.
arXiv Detail & Related papers (2024-03-21T19:12:37Z) - Towards more Practical Threat Models in Artificial Intelligence Security [66.67624011455423]
Recent works have identified a gap between research and practice in artificial intelligence security.
We revisit the threat models of the six most studied attacks in AI security research and match them to AI usage in practice.
arXiv Detail & Related papers (2023-11-16T16:09:44Z) - Managing extreme AI risks amid rapid progress [171.05448842016125]
We describe risks that include large-scale social harms, malicious uses, and irreversible loss of human control over autonomous AI systems.
There is a lack of consensus about how exactly such risks arise, and how to manage them.
Present governance initiatives lack the mechanisms and institutions to prevent misuse and recklessness, and barely address autonomous systems.
arXiv Detail & Related papers (2023-10-26T17:59:06Z) - AI Deception: A Survey of Examples, Risks, and Potential Solutions [20.84424818447696]
This paper argues that a range of current AI systems have learned how to deceive humans.
We define deception as the systematic inducement of false beliefs in the pursuit of some outcome other than the truth.
arXiv Detail & Related papers (2023-08-28T17:59:35Z) - Proceedings of the Artificial Intelligence for Cyber Security (AICS)
Workshop at AAAI 2022 [55.573187938617636]
The workshop will focus on the application of AI to problems in cyber security.
Cyber systems generate large volumes of data, utilizing this effectively is beyond human capabilities.
arXiv Detail & Related papers (2022-02-28T18:27:41Z) - An Ethical Framework for Guiding the Development of Affectively-Aware
Artificial Intelligence [0.0]
We propose guidelines for evaluating the (moral and) ethical consequences of affectively-aware AI.
We propose a multi-stakeholder analysis framework that separates the ethical responsibilities of AI Developers vis-a-vis the entities that deploy such AI.
We end with recommendations for researchers, developers, operators, as well as regulators and law-makers.
arXiv Detail & Related papers (2021-07-29T03:57:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.