A Framework for Automated Measurement of Responsible AI Harms in
Generative AI Applications
- URL: http://arxiv.org/abs/2310.17750v1
- Date: Thu, 26 Oct 2023 19:45:06 GMT
- Title: A Framework for Automated Measurement of Responsible AI Harms in
Generative AI Applications
- Authors: Ahmed Magooda, Alec Helyar, Kyle Jackson, David Sullivan, Chad Atalla,
Emily Sheng, Dan Vann, Richard Edgar, Hamid Palangi, Roman Lutz, Hongliang
Kong, Vincent Yun, Eslam Kamal, Federico Zarfati, Hanna Wallach, Sarah Bird,
Mei Chen
- Abstract summary: We present a framework for the automated measurement of responsible AI (RAI) metrics for large language models (LLMs)
Our framework for automatically measuring harms from LLMs builds on existing technical and sociotechnical expertise.
We use this framework to run through several case studies investigating how different LLMs may violate a range of RAI-related principles.
- Score: 15.087045120842207
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a framework for the automated measurement of responsible AI (RAI)
metrics for large language models (LLMs) and associated products and services.
Our framework for automatically measuring harms from LLMs builds on existing
technical and sociotechnical expertise and leverages the capabilities of
state-of-the-art LLMs, such as GPT-4. We use this framework to run through
several case studies investigating how different LLMs may violate a range of
RAI-related principles. The framework may be employed alongside domain-specific
sociotechnical expertise to create measurements for new harm areas in the
future. By implementing this framework, we aim to enable more advanced harm
measurement efforts and further the responsible use of LLMs.
Related papers
- MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics Manipulation [52.739500459903724]
Large Language Models (LLMs) have demonstrated remarkable planning abilities across various domains, including robotics manipulation and navigation.
We propose a novel multi-agent LLM framework that distributes high-level planning and low-level control code generation across specialized LLM agents.
We evaluate our approach on nine RLBench tasks, including long-horizon tasks, and demonstrate its ability to solve robotics manipulation in a zero-shot setting.
arXiv Detail & Related papers (2024-11-26T17:53:44Z) - Gaps Between Research and Practice When Measuring Representational Harms Caused by LLM-Based Systems [88.35461485731162]
We identify four types of challenges that prevent practitioners from effectively using publicly available instruments for measuring representational harms.
Our goal is to advance the development of instruments for measuring representational harms that are well-suited to practitioner needs.
arXiv Detail & Related papers (2024-11-23T22:13:38Z) - AutoPT: How Far Are We from the End2End Automated Web Penetration Testing? [54.65079443902714]
We introduce AutoPT, an automated penetration testing agent based on the principle of PSM driven by LLMs.
Our results show that AutoPT outperforms the baseline framework ReAct on the GPT-4o mini model.
arXiv Detail & Related papers (2024-11-02T13:24:30Z) - From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future [15.568939568441317]
We investigate the current practice and solutions for large language models (LLMs) and LLM-based agents for software engineering.
In particular we summarise six key topics: requirement engineering, code generation, autonomous decision-making, software design, test generation, and software maintenance.
We discuss the models and benchmarks used, providing a comprehensive analysis of their applications and effectiveness in software engineering.
arXiv Detail & Related papers (2024-08-05T14:01:15Z) - Efficient Prompting for LLM-based Generative Internet of Things [88.84327500311464]
Large language models (LLMs) have demonstrated remarkable capacities on various tasks, and integrating the capacities of LLMs into the Internet of Things (IoT) applications has drawn much research attention recently.
Due to security concerns, many institutions avoid accessing state-of-the-art commercial LLM services, requiring the deployment and utilization of open-source LLMs in a local network setting.
We propose a LLM-based Generative IoT (GIoT) system deployed in the local network setting in this study.
arXiv Detail & Related papers (2024-06-14T19:24:00Z) - Using Large Language Models to Understand Telecom Standards [35.343893798039765]
Large Language Models (LLMs) may provide faster access to relevant information.
We evaluate the capability of state-of-art LLMs to be used as Question Answering (QA) assistants.
Results show that LLMs can be used as a credible reference tool on telecom technical documents.
arXiv Detail & Related papers (2024-04-02T09:54:51Z) - Towards Generating Executable Metamorphic Relations Using Large Language Models [46.26208489175692]
We propose an approach for automatically deriving executable MRs from requirements using large language models (LLMs)
To assess the feasibility of our approach, we conducted a questionnaire-based survey in collaboration with Siemens Industry Software.
arXiv Detail & Related papers (2024-01-30T13:52:47Z) - TPTU: Large Language Model-based AI Agents for Task Planning and Tool
Usage [28.554981886052953]
Large Language Models (LLMs) have emerged as powerful tools for various real-world applications.
Despite their prowess, intrinsic generative abilities of LLMs may prove insufficient for handling complex tasks.
This paper proposes a structured framework tailored for LLM-based AI Agents.
arXiv Detail & Related papers (2023-08-07T09:22:03Z) - Technology Readiness Levels for Machine Learning Systems [107.56979560568232]
Development and deployment of machine learning systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end.
We have developed a proven systems engineering approach for machine learning development and deployment.
Our "Machine Learning Technology Readiness Levels" framework defines a principled process to ensure robust, reliable, and responsible systems.
arXiv Detail & Related papers (2021-01-11T15:54:48Z) - Technology Readiness Levels for AI & ML [79.22051549519989]
Development of machine learning systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end.
Engineering systems follow well-defined processes and testing standards to streamline development for high-quality, reliable results.
We propose a proven systems engineering approach for machine learning development and deployment.
arXiv Detail & Related papers (2020-06-21T17:14:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.