Intentional Biases in LLM Responses
- URL: http://arxiv.org/abs/2311.07611v1
- Date: Sat, 11 Nov 2023 19:59:24 GMT
- Title: Intentional Biases in LLM Responses
- Authors: Nicklaus Badyal, Derek Jacoby, Yvonne Coady
- Abstract summary: We explore the differences between open source models such as Falcon-7b and the GPT-4 model from Open AI.
We find that the guardrails in the GPT-4 mixture of experts models with a supervisor, are detrimental in trying to construct personas with a variety of uncommon viewpoints.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In this study we intentionally introduce biases into large language model
responses in an attempt to create specific personas for interactive media
purposes. We explore the differences between open source models such as
Falcon-7b and the GPT-4 model from Open AI, and we quantify some differences in
responses afforded by the two systems. We find that the guardrails in the GPT-4
mixture of experts models with a supervisor, while useful in assuring AI
alignment in general, are detrimental in trying to construct personas with a
variety of uncommon viewpoints. This study aims to set the groundwork for
future exploration in intentional biases of large language models such that
these practices can be applied in the creative field, and new forms of media.
Related papers
- The Pursuit of Fairness in Artificial Intelligence Models: A Survey [2.124791625488617]
This survey offers a synopsis of the different ways researchers have promoted fairness in AI systems.
A thorough study is conducted of the approaches and techniques employed by researchers to mitigate bias in AI models.
We also delve into the impact of biased models on user experience and the ethical considerations to contemplate when developing and deploying such models.
arXiv Detail & Related papers (2024-03-26T02:33:36Z) - MAFIA: Multi-Adapter Fused Inclusive LanguAge Models [13.793816113015513]
Pretrained Language Models (PLMs) are widely used in NLP for various tasks.
Recent studies have identified various biases that such models exhibit and have proposed methods to correct these biases.
We propose a debiasing model that exploits the synergy amongst various societal biases and enables multi-bias debiasing simultaneously.
arXiv Detail & Related papers (2024-02-12T09:41:00Z) - Gemini vs GPT-4V: A Preliminary Comparison and Combination of
Vision-Language Models Through Qualitative Cases [98.35348038111508]
This paper presents an in-depth comparative study of two pioneering models: Google's Gemini and OpenAI's GPT-4V(ision)
The core of our analysis delves into the distinct visual comprehension abilities of each model.
Our findings illuminate the unique strengths and niches of both models.
arXiv Detail & Related papers (2023-12-22T18:59:58Z) - Generative Judge for Evaluating Alignment [84.09815387884753]
We propose a generative judge with 13B parameters, Auto-J, designed to address these challenges.
Our model is trained on user queries and LLM-generated responses under massive real-world scenarios.
Experimentally, Auto-J outperforms a series of strong competitors, including both open-source and closed-source models.
arXiv Detail & Related papers (2023-10-09T07:27:15Z) - Detecting Natural Language Biases with Prompt-based Learning [0.3749861135832073]
We explore how to design prompts that can indicate 4 different types of biases: (1) gender, (2) race, (3) sexual orientation, and (4) religion-based.
We apply these prompts to multiple variations of popular and well-recognized models: BERT, RoBERTa, and T5 to evaluate their biases.
We provide a comparative analysis of these models and assess them using a two-fold method: use human judgment to decide whether model predictions are biased and utilize model-level judgment (through further prompts) to understand if a model can self-diagnose the biases of its own prediction.
arXiv Detail & Related papers (2023-09-11T04:20:36Z) - Soft-prompt Tuning for Large Language Models to Evaluate Bias [0.03141085922386211]
Using soft-prompts to evaluate bias gives us the extra advantage of avoiding the human-bias injection.
We check the model biases on different sensitive attributes using the group fairness (bias) and find interesting bias patterns.
arXiv Detail & Related papers (2023-06-07T19:11:25Z) - Should ChatGPT be Biased? Challenges and Risks of Bias in Large Language
Models [11.323961700172175]
This article investigates the challenges and risks associated with biases in large-scale language models like ChatGPT.
We discuss the origins of biases, stemming from, among others, the nature of training data, model specifications, algorithmic constraints, product design, and policy decisions.
We review the current approaches to identify, quantify, and mitigate biases in language models, emphasizing the need for a multi-disciplinary, collaborative effort to develop more equitable, transparent, and responsible AI systems.
arXiv Detail & Related papers (2023-04-07T17:14:00Z) - Foundation Models for Decision Making: Problems, Methods, and
Opportunities [124.79381732197649]
Foundation models pretrained on diverse data at scale have demonstrated extraordinary capabilities in a wide range of vision and language tasks.
New paradigms are emerging for training foundation models to interact with other agents and perform long-term reasoning.
Research at the intersection of foundation models and decision making holds tremendous promise for creating powerful new systems.
arXiv Detail & Related papers (2023-03-07T18:44:07Z) - Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding.
We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z) - DIME: Fine-grained Interpretations of Multimodal Models via Disentangled
Local Explanations [119.1953397679783]
We focus on advancing the state-of-the-art in interpreting multimodal models.
Our proposed approach, DIME, enables accurate and fine-grained analysis of multimodal models.
arXiv Detail & Related papers (2022-03-03T20:52:47Z) - Plausible Counterfactuals: Auditing Deep Learning Classifiers with
Realistic Adversarial Examples [84.8370546614042]
Black-box nature of Deep Learning models has posed unanswered questions about what they learn from data.
Generative Adversarial Network (GAN) and multi-objectives are used to furnish a plausible attack to the audited model.
Its utility is showcased within a human face classification task, unveiling the enormous potential of the proposed framework.
arXiv Detail & Related papers (2020-03-25T11:08:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.