In Quest of an Extensible Multi-Level Harm Taxonomy for Adversarial AI: Heart of Security, Ethical Risk Scoring and Resilience Analytics
- URL: http://arxiv.org/abs/2601.16930v1
- Date: Fri, 23 Jan 2026 17:44:05 GMT
- Title: In Quest of an Extensible Multi-Level Harm Taxonomy for Adversarial AI: Heart of Security, Ethical Risk Scoring and Resilience Analytics
- Authors: Javed I. Khan, Sharmila Rahman Prithula,
- Abstract summary: Harm is invoked everywhere from cybersecurity, ethics, risk analysis, to adversarial AI.<n>Current discourse relies on vague, under specified notions of harm, rendering nuanced, structured, and qualitative assessment effectively impossible.<n>We introduce a structured and expandable taxonomy of harms, grounded in an ensemble of contemporary ethical theories.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Harm is invoked everywhere from cybersecurity, ethics, risk analysis, to adversarial AI, yet there exists no systematic or agreed upon list of harms, and the concept itself is rarely defined with the precision required for serious analysis. Current discourse relies on vague, under specified notions of harm, rendering nuanced, structured, and qualitative assessment effectively impossible. This paper challenges that gap directly. We introduce a structured and expandable taxonomy of harms, grounded in an ensemble of contemporary ethical theories, that makes harm explicit, enumerable, and analytically tractable. The proposed framework identifies 66+ distinct harm types, systematically organized into two overarching domains human and nonhuman, and eleven major categories, each explicitly aligned with eleven dominant ethical theories. While extensible by design, the upper levels are intentionally stable. Beyond classification, we introduce a theory-aware taxonomy of victim entities and formalize normative harm attributes, including reversibility and duration that materially alter ethical severity. Together, these contributions transform harm from a rhetorical placeholder into an operational object of analysis, enabling rigorous ethical reasoning and long term safety evaluation of AI systems and other sociotechnical domains where harm is a first order concern.
Related papers
- Mirror: A Multi-Agent System for AI-Assisted Ethics Review [104.3684024153469]
Mirror is an agentic framework for AI-assisted ethical review.<n>It integrates ethical reasoning, structured rule interpretation, and multi-agent deliberation within a unified architecture.
arXiv Detail & Related papers (2026-02-09T03:38:55Z) - PluriHarms: Benchmarking the Full Spectrum of Human Judgments on AI Harm [39.043933213898136]
Current AI safety frameworks, which often treat harmfulness as binary, lack the flexibility to handle borderline cases where humans disagree.<n>We introduce PluriHarms, a benchmark designed to study human harm judgments across two key dimensions -- the harm axis (benign to harmful) and the agreement axis (agreement to disagreement)<n>Our scalable framework generates prompts that capture diverse AI harms and human values while targeting cases with high disagreement rates, validated by human data.
arXiv Detail & Related papers (2026-01-13T19:41:11Z) - EmoRAG: Evaluating RAG Robustness to Symbolic Perturbations [57.97838850473147]
Retrieval-Augmented Generation (RAG) systems are increasingly central to robust AI.<n>Our study unveils a critical, overlooked vulnerability: their susceptibility to subtle symbolic perturbations.<n>We demonstrate that injecting a single emoticon into a query makes it nearly 100% likely to retrieve semantically unrelated texts.
arXiv Detail & Related papers (2025-12-01T06:53:49Z) - AI Deception: Risks, Dynamics, and Controls [153.71048309527225]
This project provides a comprehensive and up-to-date overview of the AI deception field.<n>We identify a formal definition of AI deception, grounded in signaling theory from studies of animal deception.<n>We organize the landscape of AI deception research as a deception cycle, consisting of two key components: deception emergence and deception treatment.
arXiv Detail & Related papers (2025-11-27T16:56:04Z) - Echoes of AI Harms: A Human-LLM Synergistic Framework for Bias-Driven Harm Anticipation [1.5892420496333068]
ECHO is a novel framework for proactive AI harm anticipation.<n>It enables early-stage detection of bias-to-harm pathways.<n>We validate ECHO in two high-stakes domains (disease diagnosis and hiring)
arXiv Detail & Related papers (2025-11-27T07:25:21Z) - SafeRBench: A Comprehensive Benchmark for Safety Assessment in Large Reasoning Models [60.8821834954637]
We present SafeRBench, the first benchmark that assesses LRM safety end-to-end.<n>We pioneer the incorporation of risk categories and levels into input design.<n>We introduce a micro-thought chunking mechanism to segment long reasoning traces into semantically coherent units.
arXiv Detail & Related papers (2025-11-19T06:46:33Z) - AI Harmonics: a human-centric and harms severity-adaptive AI risk assessment framework [4.84912384919978]
Existing AI risk assessment models focus on internal compliance, often neglecting diverse stakeholder perspectives and real-world consequences.<n>We propose a paradigm shift to a human-centric, harm-severity adaptive approach grounded in empirical incident data.<n>We present AI Harmonics, which includes a novel AI harm assessment metric (AIH) that leverages ordinal severity data to capture relative impact without requiring precise numerical estimates.
arXiv Detail & Related papers (2025-09-12T09:52:45Z) - A Conjecture on a Fundamental Trade-Off between Certainty and Scope in Symbolic and Generative AI [0.0]
Article introduces a conjecture that formalises a fundamental trade-off between provable correctness and broad data-mapping capacity in AI systems.<n>By making this implicit trade-off explicit and open to rigorous verification, the conjecture significantly reframes both engineering ambitions and philosophical expectations for AI.
arXiv Detail & Related papers (2025-06-11T19:18:13Z) - PRJ: Perception-Retrieval-Judgement for Generated Images [6.940819432582308]
Perception-Retrieval-Judgement (PRJ) is a framework that models toxicity detection as a structured reasoning process.<n>PRJ follows a three-stage design: it first transforms an image into descriptive language (perception), then retrieves external knowledge related to harm categories and traits (retrieval), and finally evaluates toxicity based on legal or normative rules (judgement)<n> Experiments show that PRJ surpasses existing safety checkers in detection accuracy and robustness while uniquely supporting structured category-level toxicity interpretation.
arXiv Detail & Related papers (2025-06-04T08:13:53Z) - AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommons [62.374792825813394]
This paper introduces AILuminate v1.0, the first comprehensive industry-standard benchmark for assessing AI-product risk and reliability.<n>The benchmark evaluates an AI system's resistance to prompts designed to elicit dangerous, illegal, or undesirable behavior in 12 hazard categories.
arXiv Detail & Related papers (2025-02-19T05:58:52Z) - SafetyAnalyst: Interpretable, Transparent, and Steerable Safety Moderation for AI Behavior [56.10557932893919]
We present SafetyAnalyst, a novel AI safety moderation framework.<n>Given an AI behavior, SafetyAnalyst uses chain-of-thought reasoning to analyze its potential consequences.<n>It aggregates effects into a harmfulness score using 28 fully interpretable weight parameters.
arXiv Detail & Related papers (2024-10-22T03:38:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.