Nishpaksh: TEC Standard-Compliant Framework for Fairness Auditing and Certification of AI Models
- URL: http://arxiv.org/abs/2601.16926v1
- Date: Fri, 23 Jan 2026 17:35:05 GMT
- Title: Nishpaksh: TEC Standard-Compliant Framework for Fairness Auditing and Certification of AI Models
- Authors: Shashank Prakash, Ranjitha Prasad, Avinash Agarwal,
- Abstract summary: We propose Nishpaksh, an indigenous fairness evaluation tool that operationalizes the Telecommunication Engineering Centre (TEC) Standard for the Evaluation and Rating of Artificial Intelligence Systems.<n>Nishpaksh integrates survey-based risk, contextual threshold determination, and quantitative fairness evaluation into a unified, web-based dashboard.
- Score: 4.881152405850494
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The growing reliance on Artificial Intelligence (AI) models in high-stakes decision-making systems, particularly within emerging telecom and 6G applications, underscores the urgent need for transparent and standardized fairness assessment frameworks. While global toolkits such as IBM AI Fairness 360 and Microsoft Fairlearn have advanced bias detection, they often lack alignment with region-specific regulatory requirements and national priorities. To address this gap, we propose Nishpaksh, an indigenous fairness evaluation tool that operationalizes the Telecommunication Engineering Centre (TEC) Standard for the Evaluation and Rating of Artificial Intelligence Systems. Nishpaksh integrates survey-based risk quantification, contextual threshold determination, and quantitative fairness evaluation into a unified, web-based dashboard. The tool employs vectorized computation, reactive state management, and certification-ready reporting to enable reproducible, audit-grade assessments, thereby addressing a critical post-standardization implementation need. Experimental validation on the COMPAS dataset demonstrates Nishpaksh's effectiveness in identifying attribute-specific bias and generating standardized fairness scores compliant with the TEC framework. The system bridges the gap between research-oriented fairness methodologies and regulatory AI governance in India, marking a significant step toward responsible and auditable AI deployment within critical infrastructure like telecommunications.
Related papers
- AI-NativeBench: An Open-Source White-Box Agentic Benchmark Suite for AI-Native Systems [52.65695508605237]
We introduce AI-NativeBench, the first application-centric and white-box AI-Native benchmark suite grounded in Model Context Protocol (MCP) and Agent-to-Agent (A2A) standards.<n>By treating agentic spans as first-class citizens within distributed traces, our methodology enables granular analysis of engineering characteristics beyond simple capabilities.<n>This work provides the first systematic evidence to guide the transition from measuring model capability to engineering reliable AI-Native systems.
arXiv Detail & Related papers (2026-01-14T11:32:07Z) - Reliable LLM-Based Edge-Cloud-Expert Cascades for Telecom Knowledge Systems [54.916243942641444]
Large language models (LLMs) are emerging as key enablers of automation in domains such as telecommunications.<n>We study an edge-cloud-expert cascaded LLM-based knowledge system that supports decision-making through a question-and-answer pipeline.
arXiv Detail & Related papers (2025-12-23T03:10:09Z) - Lost in Vagueness: Towards Context-Sensitive Standards for Robustness Assessment under the EU AI Act [2.740981829798319]
Robustness is a key requirement for high-risk AI systems under the EU Artificial Intelligence Act (AI Act)<n>This paper investigates what it means for AI systems to be robust and illustrates the need for context-sensitive standardisation.
arXiv Detail & Related papers (2025-11-19T17:06:36Z) - Variance-Bounded Evaluation of Entity-Centric AI Systems Without Ground Truth: Theory and Measurement [0.0]
We introduce VB-Score, a variance-bounded evaluation framework for entity-centric AI systems.<n> VB-Score enumerates plausible interpretations through constraint relaxation and Monte Carlo sampling.<n>It then evaluates system outputs by their expected success across interpretations, penalized by variance to assess robustness of the system.
arXiv Detail & Related papers (2025-09-26T07:54:38Z) - Safe and Certifiable AI Systems: Concepts, Challenges, and Lessons Learned [45.44933002008943]
This white paper presents the T"UV AUSTRIA Trusted AI framework.<n>It is an end-to-end audit catalog and methodology for assessing and certifying machine learning systems.<n>Building on three pillars - Secure Software Development, Functional Requirements, and Ethics & Data Privacy - it translates the high-level obligations of the EU AI Act into specific, testable criteria.
arXiv Detail & Related papers (2025-09-08T17:52:08Z) - INSEva: A Comprehensive Chinese Benchmark for Large Language Models in Insurance [48.22571187209047]
INSEva is a Chinese benchmark specifically designed for evaluating AI systems' knowledge and capabilities in insurance.<n> INSEva features a multi-dimensional evaluation taxonomy covering business areas, task formats, difficulty levels, and cognitive-knowledge dimension.<n>Our benchmark implements tailored evaluation methods for assessing both faithfulness and completeness in open-ended responses.
arXiv Detail & Related papers (2025-08-27T03:13:40Z) - SEOE: A Scalable and Reliable Semantic Evaluation Framework for Open Domain Event Detection [70.23196257213829]
We propose a scalable and reliable Semantic-level Evaluation framework for Open domain Event detection.<n>Our proposed framework first constructs a scalable evaluation benchmark that currently includes 564 event types covering 7 major domains.<n>We then leverage large language models (LLMs) as automatic evaluation agents to compute a semantic F1-score, incorporating fine-grained definitions of semantically similar labels.
arXiv Detail & Related papers (2025-03-05T09:37:05Z) - AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommons [62.374792825813394]
This paper introduces AILuminate v1.0, the first comprehensive industry-standard benchmark for assessing AI-product risk and reliability.<n>The benchmark evaluates an AI system's resistance to prompts designed to elicit dangerous, illegal, or undesirable behavior in 12 hazard categories.
arXiv Detail & Related papers (2025-02-19T05:58:52Z) - A Unified Framework for Evaluating the Effectiveness and Enhancing the Transparency of Explainable AI Methods in Real-World Applications [2.0681376988193843]
This study introduces a single evaluation framework for XAI.<n>It uses both numbers and user feedback to check if the explanations are correct, easy to understand, fair, complete, and reliable.<n>We show the value of this framework through case studies in healthcare, finance, farming, and self-driving systems.
arXiv Detail & Related papers (2024-12-05T05:30:10Z) - Fairness Score and Process Standardization: Framework for Fairness
Certification in Artificial Intelligence Systems [0.4297070083645048]
We propose a novel Fairness Score to measure the fairness of a data-driven AI system.
It will also provide a framework to operationalise the concept of fairness and facilitate the commercial deployment of such systems.
arXiv Detail & Related papers (2022-01-10T15:45:12Z) - Multisource AI Scorecard Table for System Evaluation [3.74397577716445]
The paper describes a Multisource AI Scorecard Table (MAST) that provides the developer and user of an artificial intelligence (AI)/machine learning (ML) system with a standard checklist.
The paper explores how the analytic tradecraft standards outlined in Intelligence Community Directive (ICD) 203 can provide a framework for assessing the performance of an AI system.
arXiv Detail & Related papers (2021-02-08T03:37:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.