Adversarial Scrutiny of Evidentiary Statistical Software
- URL: http://arxiv.org/abs/2206.09305v2
- Date: Fri, 30 Sep 2022 21:51:08 GMT
- Title: Adversarial Scrutiny of Evidentiary Statistical Software
- Authors: Rediet Abebe, Moritz Hardt, Angela Jin, John Miller, Ludwig Schmidt,
Rebecca Wexler
- Abstract summary: U.S. criminal legal system increasingly relies on software output to convict and incarcerate people.
We propose robust adversarial testing as an audit framework to examine the validity of evidentiary statistical software.
- Score: 32.962815960406196
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The U.S. criminal legal system increasingly relies on software output to
convict and incarcerate people. In a large number of cases each year, the
government makes these consequential decisions based on evidence from
statistical software -- such as probabilistic genotyping, environmental audio
detection, and toolmark analysis tools -- that defense counsel cannot fully
cross-examine or scrutinize. This undermines the commitments of the adversarial
criminal legal system, which relies on the defense's ability to probe and test
the prosecution's case to safeguard individual rights.
Responding to this need to adversarially scrutinize output from such
software, we propose robust adversarial testing as an audit framework to
examine the validity of evidentiary statistical software. We define and
operationalize this notion of robust adversarial testing for defense use by
drawing on a large body of recent work in robust machine learning and
algorithmic fairness. We demonstrate how this framework both standardizes the
process for scrutinizing such tools and empowers defense lawyers to examine
their validity for instances most relevant to the case at hand. We further
discuss existing structural and institutional challenges within the U.S.
criminal legal system that may create barriers for implementing this and other
such audit frameworks and close with a discussion on policy changes that could
help address these concerns.
Related papers
- From Transparency to Accountability and Back: A Discussion of Access and Evidence in AI Auditing [1.196505602609637]
Audits can take many forms, including pre-deployment risk assessments, ongoing monitoring, and compliance testing.
There are many operational challenges to AI auditing that complicate its implementation.
We argue that auditing can be cast as a natural hypothesis test, draw parallels hypothesis testing and legal procedure, and argue that this framing provides clear and interpretable guidance on audit implementation.
arXiv Detail & Related papers (2024-10-07T06:15:46Z) - Metamorphic Debugging for Accountable Software [8.001739956625483]
Translating legalese into formal specifications is one challenge.
Lack of a definitive 'truth' for queries (the oracle problem) is another.
We propose that these challenges can be tackled by focusing on relational specifications.
arXiv Detail & Related papers (2024-09-24T14:45:13Z) - Automating Semantic Analysis of System Assurance Cases using Goal-directed ASP [1.2189422792863451]
We present our approach to enhancing Assurance 2.0 with semantic rule-based analysis capabilities.
We examine the unique semantic aspects of assurance cases, such as logical consistency, adequacy, indefeasibility, etc.
arXiv Detail & Related papers (2024-08-21T15:22:43Z) - (Beyond) Reasonable Doubt: Challenges that Public Defenders Face in Scrutinizing AI in Court [7.742399489996169]
We study efforts to contest AI systems in practice by studying how public defenders scrutinize AI in court.
We present findings from interviews with 17 people in the U.S. public defense community.
arXiv Detail & Related papers (2024-03-13T23:19:46Z) - The Decisive Power of Indecision: Low-Variance Risk-Limiting Audits and Election Contestation via Marginal Mark Recording [51.82772358241505]
Risk-limiting audits (RLAs) are techniques for verifying the outcomes of large elections.
We define new families of audits that improve efficiency and offer advances in statistical power.
New audits are enabled by revisiting the standard notion of a cast-vote record so that it can declare multiple possible mark interpretations.
arXiv Detail & Related papers (2024-02-09T16:23:54Z) - Measuring Equality in Machine Learning Security Defenses: A Case Study
in Speech Recognition [56.69875958980474]
This work considers approaches to defending learned systems and how security defenses result in performance inequities across different sub-populations.
We find that many methods that have been proposed can cause direct harm, like false rejection and unequal benefits from robustness training.
We present a comparison of equality between two rejection-based defenses: randomized smoothing and neural rejection, finding randomized smoothing more equitable due to the sampling mechanism for minority groups.
arXiv Detail & Related papers (2023-02-17T16:19:26Z) - Entity Graph Extraction from Legal Acts -- a Prototype for a Use Case in
Policy Design Analysis [52.77024349608834]
This paper presents a prototype developed to serve the quantitative study of public policy design.
Our system aims to automate the process of gathering legal documents, annotating them with Institutional Grammar, and using hypergraphs to analyse inter-relations between crucial entities.
arXiv Detail & Related papers (2022-09-02T10:57:47Z) - Having your Privacy Cake and Eating it Too: Platform-supported Auditing
of Social Media Algorithms for Public Interest [70.02478301291264]
Social media platforms curate access to information and opportunities, and so play a critical role in shaping public discourse.
Prior studies have used black-box methods to show that these algorithms can lead to biased or discriminatory outcomes.
We propose a new method for platform-supported auditing that can meet the goals of the proposed legislation.
arXiv Detail & Related papers (2022-07-18T17:32:35Z) - System Cards for AI-Based Decision-Making for Public Policy [5.076419064097733]
This work proposes a system accountability benchmark for formal audits of artificial intelligence-based decision-aiding systems.
It consists of 56 criteria organized within a four-by-four matrix composed of rows focused on (i) data, (ii) model, (iii) code, (iv) system, and columns focused on (a) development, (b) assessment, (c) mitigation, and (d) assurance.
arXiv Detail & Related papers (2022-03-01T18:56:45Z) - Equality before the Law: Legal Judgment Consistency Analysis for
Fairness [55.91612739713396]
In this paper, we propose an evaluation metric for judgment inconsistency, Legal Inconsistency Coefficient (LInCo)
We simulate judges from different groups with legal judgment prediction (LJP) models and measure the judicial inconsistency with the disagreement of the judgment results given by LJP models trained on different groups.
We employ LInCo to explore the inconsistency in real cases and come to the following observations: (1) Both regional and gender inconsistency exist in the legal system, but gender inconsistency is much less than regional inconsistency.
arXiv Detail & Related papers (2021-03-25T14:28:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.