Related papers: Automating AI Failure Tracking: Semantic Association of Reports in AI Incident Database

Automating AI Failure Tracking: Semantic Association of Reports in AI Incident Database

URL: http://arxiv.org/abs/2507.23669v1
Date: Thu, 31 Jul 2025 15:48:12 GMT
Title: Automating AI Failure Tracking: Semantic Association of Reports in AI Incident Database
Authors: Diego Russo, Gian Marco Orlando, Valerio La Gatta, Vincenzo Moscato,
Abstract summary: We propose a retrieval-based framework that automates the association of new reports with existing AI Incidents.<n>Our analysis shows that combining titles and descriptions yields substantial improvements in ranking accuracy.<n>Our approach provides a scalable and efficient solution for supporting the maintenance of the AIID.
Score: 7.946359845249688
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Artificial Intelligence (AI) systems are transforming critical sectors such as healthcare, finance, and transportation, enhancing operational efficiency and decision-making processes. However, their deployment in high-stakes domains has exposed vulnerabilities that can result in significant societal harm. To systematically study and mitigate these risk, initiatives like the AI Incident Database (AIID) have emerged, cataloging over 3,000 real-world AI failure reports. Currently, associating a new report with the appropriate AI Incident relies on manual expert intervention, limiting scalability and delaying the identification of emerging failure patterns. To address this limitation, we propose a retrieval-based framework that automates the association of new reports with existing AI Incidents through semantic similarity modeling. We formalize the task as a ranking problem, where each report-comprising a title and a full textual description-is compared to previously documented AI Incidents based on embedding cosine similarity. Benchmarking traditional lexical methods, cross-encoder architectures, and transformer-based sentence embedding models, we find that the latter consistently achieve superior performance. Our analysis further shows that combining titles and descriptions yields substantial improvements in ranking accuracy compared to using titles alone. Moreover, retrieval performance remains stable across variations in description length, highlighting the robustness of the framework. Finally, we find that retrieval performance consistently improves as the training set expands. Our approach provides a scalable and efficient solution for supporting the maintenance of the AIID.

Related papers

Barbarians at the Gate: How AI is Upending Systems Research [58.95406995634148]
We argue that systems research, long focused on designing and evaluating new performance-oriented algorithms, is particularly well-suited for AI-driven solution discovery.<n>We term this approach as AI-Driven Research for Systems ( ADRS), which iteratively generates, evaluates, and refines solutions.<n>Our results highlight both the disruptive potential and the urgent need to adapt systems research practices in the age of AI.
arXiv Detail & Related papers (2025-10-07T17:49:24Z)
AI Agentic Vulnerability Injection And Transformation with Optimized Reasoning [2.918225266151982]
We present AVIATOR, the first AI-agentic vulnerability injection workflow.<n>It automatically injects realistic, category-specific vulnerabilities for high-fidelity, diverse, large-scale vulnerability dataset generation.<n>It combines semantic analysis, injection synthesis enhanced with LoRA-based fine-tuning and Retrieval-Augmented Generation, as well as post-injection validation via static analysis and LLM-based discriminators.
arXiv Detail & Related papers (2025-08-28T14:59:39Z)
Graphs Meet AI Agents: Taxonomy, Progress, and Future Opportunities [117.49715661395294]
Data structurization can play a promising role by transforming intricate and disorganized data into well-structured forms.<n>This survey presents a first systematic review of how graphs can empower AI agents.
arXiv Detail & Related papers (2025-06-22T12:59:12Z)
AutoMind: Adaptive Knowledgeable Agent for Automated Data Science [70.33796196103499]
Large Language Model (LLM) agents have shown great potential in addressing real-world data science problems.<n>Existing frameworks depend on rigid, pre-defined and inflexible coding strategies.<n>We introduce AutoMind, an adaptive, knowledgeable LLM-agent framework.
arXiv Detail & Related papers (2025-06-12T17:59:32Z)
Scalable, Symbiotic, AI and Non-AI Agent Based Parallel Discrete Event Simulations [0.0]
This paper introduces a novel parallel discrete event simulation (PDES) based methodology to combine multiple AI and non-AI agents.<n>We evaluate our approach by solving four problems from four different domains and comparing the results with those from AI models alone.<n>Results show that overall accuracy of our approach is 68% where as the accuracy of vanilla models is less than 23%.
arXiv Detail & Related papers (2025-05-28T17:50:01Z)
Shifting AI Efficiency From Model-Centric to Data-Centric Compression [67.45087283924732]
We argue that the focus of research for AI is shifting from model-centric compression to data-centric compression.<n>Data-centric compression improves AI efficiency by directly compressing the volume of data processed during model training or inference.<n>Our work aims to provide a novel perspective on AI efficiency, synthesize existing efforts, and catalyze innovation to address the challenges posed by ever-increasing context lengths.
arXiv Detail & Related papers (2025-05-25T13:51:17Z)
Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute [61.00662702026523]
We propose a unified Test-Time Compute scaling framework that leverages increased inference-time instead of larger models.<n>Our framework incorporates two complementary strategies: internal TTC and external TTC.<n>We demonstrate our textbf32B model achieves a 46% issue resolution rate, surpassing significantly larger models such as DeepSeek R1 671B and OpenAI o1.
arXiv Detail & Related papers (2025-03-31T07:31:32Z)
DARS: Dynamic Action Re-Sampling to Enhance Coding Agent Performance by Adaptive Tree Traversal [55.13854171147104]
Large Language Models (LLMs) have revolutionized various domains, including natural language processing, data analysis, and software development.<n>We present Dynamic Action Re-Sampling (DARS), a novel inference time compute scaling approach for coding agents.<n>We evaluate our approach on SWE-Bench Lite benchmark, demonstrating that this scaling strategy achieves a pass@k score of 55% with Claude 3.5 Sonnet V2.
arXiv Detail & Related papers (2025-03-18T14:02:59Z)
Interactive Agents to Overcome Ambiguity in Software Engineering [61.40183840499932]
AI agents are increasingly being deployed to automate tasks, often based on ambiguous and underspecified user instructions.<n>Making unwarranted assumptions and failing to ask clarifying questions can lead to suboptimal outcomes.<n>We study the ability of LLM agents to handle ambiguous instructions in interactive code generation settings by evaluating proprietary and open-weight models on their performance.
arXiv Detail & Related papers (2025-02-18T17:12:26Z)
Multi-Agent Actor-Critic Generative AI for Query Resolution and Analysis [1.0124625066746598]
We introduce MASQRAD, a transformative framework for query resolution based on the actor-critic model.<n> MASQRAD is excellent at translating imprecise or ambiguous user inquiries into precise and actionable requests.<n> MASQRAD functions as a sophisticated multi-agent system but "masquerades" to users as a single AI entity.
arXiv Detail & Related papers (2025-02-17T04:03:15Z)
Standardised schema and taxonomy for AI incident databases in critical digital infrastructure [2.209921757303168]
The rapid deployment of Artificial Intelligence in critical digital infrastructure introduces significant risks.<n>Existing databases lack the granularity as well as the standardized structure required for consistent data collection and analysis.<n>This work proposes a standardized schema and taxonomy for AI incident databases, enabling detailed and structured documentation of AI incidents across sectors.
arXiv Detail & Related papers (2025-01-28T15:59:01Z)
Agentic AI-Driven Technical Troubleshooting for Enterprise Systems: A Novel Weighted Retrieval-Augmented Generation Paradigm [0.0]
This paper presents a novel agentic AI solution built on a Weighted Retrieval-Augmented Generation (RAG) Framework tailored for enterprise technical troubleshooting.<n>By dynamically weighting retrieval sources such as product manuals, internal knowledge bases, FAQ, and troubleshooting guides, the framework prioritizes the most relevant data.<n>Preliminary evaluations on large enterprise datasets demonstrate the framework's efficacy in improving troubleshooting accuracy, reducing resolution times, and adapting to varied technical challenges.
arXiv Detail & Related papers (2024-12-16T17:32:38Z)
Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning [50.47568731994238]
Key method for creating Artificial Intelligence (AI) agents is Reinforcement Learning (RL) This paper presents a general framework model for integrating and learning structured reasoning into AI agents' policies.
arXiv Detail & Related papers (2023-12-22T17:57:57Z)
Towards a Responsible AI Metrics Catalogue: A Collection of Metrics for AI Accountability [28.67753149592534]
This study bridges the accountability gap by introducing our effort towards a comprehensive metrics catalogue. Our catalogue delineates process metrics that underpin procedural integrity, resource metrics that provide necessary tools and frameworks, and product metrics that reflect the outputs of AI systems.
arXiv Detail & Related papers (2023-11-22T04:43:16Z)
Dynamic Federated Learning [57.14673504239551]
Federated learning has emerged as an umbrella term for centralized coordination strategies in multi-agent environments. We consider a federated learning model where at every iteration, a random subset of available agents perform local updates based on their data. Under a non-stationary random walk model on the true minimizer for the aggregate optimization problem, we establish that the performance of the architecture is determined by three factors, namely, the data variability at each agent, the model variability across all agents, and a tracking term that is inversely proportional to the learning rate of the algorithm.
arXiv Detail & Related papers (2020-02-20T15:00:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.