Beyond Behaviorist Representational Harms: A Plan for Measurement and Mitigation
- URL: http://arxiv.org/abs/2402.01705v2
- Date: Mon, 6 May 2024 21:00:00 GMT
- Title: Beyond Behaviorist Representational Harms: A Plan for Measurement and Mitigation
- Authors: Jennifer Chien, David Danks,
- Abstract summary: This study focuses on an examination of current definitions of representational harms to discern what is included and what is not.
Our work highlights the unique vulnerabilities of large language models to perpetrating representational harms.
The overarching aim of this research is to establish a framework for broadening the definition of representational harms.
- Score: 1.7355698649527407
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Algorithmic harms are commonly categorized as either allocative or representational. This study specifically addresses the latter, focusing on an examination of current definitions of representational harms to discern what is included and what is not. This analysis motivates our expansion beyond behavioral definitions to encompass harms to cognitive and affective states. The paper outlines high-level requirements for measurement: identifying the necessary expertise to implement this approach and illustrating it through a case study. Our work highlights the unique vulnerabilities of large language models to perpetrating representational harms, particularly when these harms go unmeasured and unmitigated. The work concludes by presenting proposed mitigations and delineating when to employ them. The overarching aim of this research is to establish a framework for broadening the definition of representational harms and to translate insights from fairness research into practical measurement and mitigation praxis.
Related papers
- A Survey of Defenses against AI-generated Visual Media: Detection, Disruption, and Authentication [15.879482578829489]
Deep generative models have demonstrated impressive performance in various computer vision applications.
These models may be used for malicious purposes, such as misinformation, deception, and copyright violation.
This paper provides a systematic and timely review of research efforts on defenses against AI-generated visual media.
arXiv Detail & Related papers (2024-07-15T09:46:02Z) - The Odyssey of Commonsense Causality: From Foundational Benchmarks to Cutting-Edge Reasoning [70.16523526957162]
Understanding commonsense causality helps people understand the principles of the real world better.
Despite its significance, a systematic exploration of this topic is notably lacking.
Our work aims to provide a systematic overview, update scholars on recent advancements, and provide a pragmatic guide for beginners.
arXiv Detail & Related papers (2024-06-27T16:30:50Z) - Towards Non-Adversarial Algorithmic Recourse [20.819764720587646]
It has been argued that adversarial examples, as opposed to counterfactual explanations, have a unique characteristic in that they lead to a misclassification compared to the ground truth.
We introduce non-adversarial algorithmic recourse and outline why in high-stakes situations, it is imperative to obtain counterfactual explanations that do not exhibit adversarial characteristics.
arXiv Detail & Related papers (2024-03-15T14:18:21Z) - An Investigation of Representation and Allocation Harms in Contrastive
Learning [55.42336321517228]
We demonstrate that contrastive learning (CL) tends to collapse representations of minority groups with certain majority groups.
We refer to this phenomenon as representation harm and demonstrate it on image and text datasets using the corresponding popular CL methods.
We provide a theoretical explanation for representation harm using a neural block model that leads to a representational collapse in a contrastive learning setting.
arXiv Detail & Related papers (2023-10-02T19:25:37Z) - Language Generation Models Can Cause Harm: So What Can We Do About It?
An Actionable Survey [50.58063811745676]
This work provides a survey of practical methods for addressing potential threats and societal harms from language generation models.
We draw on several prior works' of language model risks to present a structured overview of strategies for detecting and ameliorating different kinds of risks/harms of language generators.
arXiv Detail & Related papers (2022-10-14T10:43:39Z) - A Principled Design of Image Representation: Towards Forensic Tasks [75.40968680537544]
We investigate the forensic-oriented image representation as a distinct problem, from the perspectives of theory, implementation, and application.
At the theoretical level, we propose a new representation framework for forensics, called Dense Invariant Representation (DIR), which is characterized by stable description with mathematical guarantees.
We demonstrate the above arguments on the dense-domain pattern detection and matching experiments, providing comparison results with state-of-the-art descriptors.
arXiv Detail & Related papers (2022-03-02T07:46:52Z) - A dual benchmarking study of facial forgery and facial forensics [28.979062525272866]
In recent years, visual forgery has reached a level of sophistication that humans cannot identify fraud.
A rich body of visual forensic techniques has been proposed in an attempt to stop this dangerous trend.
We present a benchmark that provides in-depth insights into visual forgery and visual forensics.
arXiv Detail & Related papers (2021-11-25T05:01:08Z) - Exploring Robustness of Unsupervised Domain Adaptation in Semantic
Segmentation [74.05906222376608]
We propose adversarial self-supervision UDA (or ASSUDA) that maximizes the agreement between clean images and their adversarial examples by a contrastive loss in the output space.
This paper is rooted in two observations: (i) the robustness of UDA methods in semantic segmentation remains unexplored, which pose a security concern in this field; and (ii) although commonly used self-supervision (e.g., rotation and jigsaw) benefits image tasks such as classification and recognition, they fail to provide the critical supervision signals that could learn discriminative representation for segmentation tasks.
arXiv Detail & Related papers (2021-05-23T01:50:44Z) - Adversarial Machine Learning in Image Classification: A Survey Towards
the Defender's Perspective [1.933681537640272]
Adversarial examples are images containing subtle perturbations generated by malicious optimization algorithms.
Deep Learning algorithms have been used in security-critical applications, such as biometric recognition systems and self-driving cars.
arXiv Detail & Related papers (2020-09-08T13:21:55Z) - Amnesic Probing: Behavioral Explanation with Amnesic Counterfactuals [53.484562601127195]
We point out the inability to infer behavioral conclusions from probing results.
We offer an alternative method that focuses on how the information is being used, rather than on what information is encoded.
arXiv Detail & Related papers (2020-06-01T15:00:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.