Towards Fair and Explainable AI using a Human-Centered AI Approach
- URL: http://arxiv.org/abs/2306.07427v1
- Date: Mon, 12 Jun 2023 21:08:55 GMT
- Title: Towards Fair and Explainable AI using a Human-Centered AI Approach
- Authors: Bhavya Ghai
- Abstract summary: We present 5 research projects that aim to enhance explainability and fairness in classification systems and word embeddings.
The first project explores the utility/downsides of introducing local model explanations as interfaces for machine teachers.
The second project presents D-BIAS, a causality-based human-in-the-loop visual tool for identifying and mitigating social biases in datasets.
The third project presents WordBias, a visual interactive tool that helps audit pre-trained static word embeddings for biases against groups.
The fourth project presents DramatVis Personae, a visual analytics tool that helps identify social
- Score: 5.888646114353372
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rise of machine learning (ML) is accompanied by several high-profile
cases that have stressed the need for fairness, accountability, explainability
and trust in ML systems. The existing literature has largely focused on fully
automated ML approaches that try to optimize for some performance metric.
However, human-centric measures like fairness, trust, explainability, etc. are
subjective in nature, context-dependent, and might not correlate with
conventional performance metrics. To deal with these challenges, we explore a
human-centered AI approach that empowers people by providing more transparency
and human control.
In this dissertation, we present 5 research projects that aim to enhance
explainability and fairness in classification systems and word embeddings. The
first project explores the utility/downsides of introducing local model
explanations as interfaces for machine teachers (crowd workers). Our study
found that adding explanations supports trust calibration for the resulting ML
model and enables rich forms of teaching feedback. The second project presents
D-BIAS, a causality-based human-in-the-loop visual tool for identifying and
mitigating social biases in tabular datasets. Apart from fairness, we found
that our tool also enhances trust and accountability. The third project
presents WordBias, a visual interactive tool that helps audit pre-trained
static word embeddings for biases against groups, such as females, or
subgroups, such as Black Muslim females. The fourth project presents DramatVis
Personae, a visual analytics tool that helps identify social biases in creative
writing. Finally, the last project presents an empirical study aimed at
understanding the cumulative impact of multiple fairness-enhancing
interventions at different stages of the ML pipeline on fairness, utility and
different population groups. We conclude by discussing some of the future
directions.
Related papers
- Interpreting Pretrained Language Models via Concept Bottlenecks [55.47515772358389]
Pretrained language models (PLMs) have made significant strides in various natural language processing tasks.
The lack of interpretability due to their black-box'' nature poses challenges for responsible implementation.
We propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans.
arXiv Detail & Related papers (2023-11-08T20:41:18Z) - TIDE: Textual Identity Detection for Evaluating and Augmenting
Classification and Language Models [0.0]
Machine learning models can perpetuate unintended biases from unfair and imbalanced datasets.
We present a dataset coupled with an approach to improve text fairness in classifiers and language models.
We leverage TIDAL to develop an identity annotation and augmentation tool that can be used to improve the availability of identity context.
arXiv Detail & Related papers (2023-09-07T21:44:42Z) - Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs)
We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing.
We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z) - Aligning Large Language Models with Human: A Survey [53.6014921995006]
Large Language Models (LLMs) trained on extensive textual corpora have emerged as leading solutions for a broad array of Natural Language Processing (NLP) tasks.
Despite their notable performance, these models are prone to certain limitations such as misunderstanding human instructions, generating potentially biased content, or factually incorrect information.
This survey presents a comprehensive overview of these alignment technologies, including the following aspects.
arXiv Detail & Related papers (2023-07-24T17:44:58Z) - Post Hoc Explanations of Language Models Can Improve Language Models [43.2109029463221]
We present a novel framework, Amplifying Model Performance by Leveraging In-Context Learning with Post Hoc Explanations (AMPLIFY)
We leverage post hoc explanation methods which output attribution scores (explanations) capturing the influence of each of the input features on model predictions.
Our framework, AMPLIFY, leads to prediction accuracy improvements of about 10-25% over a wide range of tasks.
arXiv Detail & Related papers (2023-05-19T04:46:04Z) - Investigating Fairness Disparities in Peer Review: A Language Model
Enhanced Approach [77.61131357420201]
We conduct a thorough and rigorous study on fairness disparities in peer review with the help of large language models (LMs)
We collect, assemble, and maintain a comprehensive relational database for the International Conference on Learning Representations (ICLR) conference from 2017 to date.
We postulate and study fairness disparities on multiple protective attributes of interest, including author gender, geography, author, and institutional prestige.
arXiv Detail & Related papers (2022-11-07T16:19:42Z) - D-BIAS: A Causality-Based Human-in-the-Loop System for Tackling
Algorithmic Bias [57.87117733071416]
We propose D-BIAS, a visual interactive tool that embodies human-in-the-loop AI approach for auditing and mitigating social biases.
A user can detect the presence of bias against a group by identifying unfair causal relationships in the causal network.
For each interaction, say weakening/deleting a biased causal edge, the system uses a novel method to simulate a new (debiased) dataset.
arXiv Detail & Related papers (2022-08-10T03:41:48Z) - Towards Involving End-users in Interactive Human-in-the-loop AI Fairness [1.889930012459365]
Ensuring fairness in artificial intelligence (AI) is important to counteract bias and discrimination in far-reaching applications.
Recent work has started to investigate how humans judge fairness and how to support machine learning (ML) experts in making their AI models fairer.
Our work explores designing interpretable and interactive human-in-the-loop interfaces that allow ordinary end-users to identify potential fairness issues.
arXiv Detail & Related papers (2022-04-22T02:24:11Z) - Towards Understanding and Mitigating Social Biases in Language Models [107.82654101403264]
Large-scale pretrained language models (LMs) can be potentially dangerous in manifesting undesirable representational biases.
We propose steps towards mitigating social biases during text generation.
Our empirical results and human evaluation demonstrate effectiveness in mitigating bias while retaining crucial contextual information.
arXiv Detail & Related papers (2021-06-24T17:52:43Z) - FAIR: Fair Adversarial Instance Re-weighting [0.7829352305480285]
We propose a Fair Adrial Instance Re-weighting (FAIR) method, which uses adversarial training to learn instance weighting function that ensures fair predictions.
To the best of our knowledge, this is the first model that merges reweighting and adversarial approaches by means of a weighting function that can provide interpretable information about fairness of individual instances.
arXiv Detail & Related papers (2020-11-15T10:48:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.