Truth Machines: Synthesizing Veracity in AI Language Models
- URL: http://arxiv.org/abs/2301.12066v1
- Date: Sat, 28 Jan 2023 02:47:50 GMT
- Title: Truth Machines: Synthesizing Veracity in AI Language Models
- Authors: Luke Munn, Liam Magee, Vanicka Arora
- Abstract summary: We discuss the struggle for truth in AI systems and the general responses to date.
It then investigates the production of truth in InstructGPT, a large language model.
We argue that these same logics and inconsistencies play out in ChatGPT, reiterating truth as a non-trivial problem.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: As AI technologies are rolled out into healthcare, academia, human resources,
law, and a multitude of other domains, they become de-facto arbiters of truth.
But truth is highly contested, with many different definitions and approaches.
This article discusses the struggle for truth in AI systems and the general
responses to date. It then investigates the production of truth in InstructGPT,
a large language model, highlighting how data harvesting, model architectures,
and social feedback mechanisms weave together disparate understandings of
veracity. It conceptualizes this performance as an operationalization of truth,
where distinct, often conflicting claims are smoothly synthesized and
confidently presented into truth-statements. We argue that these same logics
and inconsistencies play out in Instruct's successor, ChatGPT, reiterating
truth as a non-trivial problem. We suggest that enriching sociality and
thickening "reality" are two promising vectors for enhancing the
truth-evaluating capacities of future language models. We conclude, however, by
stepping back to consider AI truth-telling as a social practice: what kind of
"truth" do we as listeners desire?
Related papers
- When lies are mostly truthful: automated verbal deception detection for embedded lies [0.3867363075280544]
We collected a novel dataset of 2,088 truthful and deceptive statements with annotated embedded lies.
We show that a fined-tuned language model (Llama-3-8B) can classify truthful statements and those containing embedded lies with 64% accuracy.
arXiv Detail & Related papers (2025-01-13T11:16:05Z) - AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents [27.10147264744531]
We study how language agents navigate scenarios with utility-truthfulness conflicts in a multi-turn interactive setting.
We develop a truthfulness detector inspired by psychological literature to assess the agents' responses.
Our experiment demonstrates that all models are truthful less than 50% of the time, although truthfulness and goal achievement (utility) rates vary across models.
arXiv Detail & Related papers (2024-09-13T17:41:12Z) - On the consistent reasoning paradox of intelligence and optimal trust in AI: The power of 'I don't know' [79.69412622010249]
Consistent reasoning, which lies at the core of human intelligence, is the ability to handle tasks that are equivalent.
CRP asserts that consistent reasoning implies fallibility -- in particular, human-like intelligence in AI necessarily comes with human-like fallibility.
arXiv Detail & Related papers (2024-08-05T10:06:53Z) - Cognitive Dissonance: Why Do Language Model Outputs Disagree with
Internal Representations of Truthfulness? [53.98071556805525]
Neural language models (LMs) can be used to evaluate the truth of factual statements.
They can be queried for statement probabilities, or probed for internal representations of truthfulness.
Past work has found that these two procedures sometimes disagree, and that probes tend to be more accurate than LM outputs.
This has led some researchers to conclude that LMs "lie" or otherwise encode non-cooperative communicative intents.
arXiv Detail & Related papers (2023-11-27T18:59:14Z) - Personas as a Way to Model Truthfulness in Language Models [23.86655844340011]
Large language models (LLMs) are trained on vast amounts of text from the internet.
This paper presents an explanation for why LMs appear to know the truth despite not being trained with truth labels.
arXiv Detail & Related papers (2023-10-27T14:27:43Z) - Brain in a Vat: On Missing Pieces Towards Artificial General
Intelligence in Large Language Models [83.63242931107638]
We propose four characteristics of generally intelligent agents.
We argue that active engagement with objects in the real world delivers more robust signals for forming conceptual representations.
We conclude by outlining promising future research directions in the field of artificial general intelligence.
arXiv Detail & Related papers (2023-07-07T13:58:16Z) - Understanding Natural Language Understanding Systems. A Critical
Analysis [91.81211519327161]
The development of machines that guillemotlefttalk like usguillemotright, also known as Natural Language Understanding (NLU) systems, is the Holy Grail of Artificial Intelligence (AI)
But never has the trust that we can build guillemotlefttalking machinesguillemotright been stronger than the one engendered by the last generation of NLU systems.
Are we at the dawn of a new era, in which the Grail is finally closer to us?
arXiv Detail & Related papers (2023-03-01T08:32:55Z) - The Inconvenient Truths of Ground Truth for Binary Analysis [3.198144010381572]
We show that not all ground truths are created equal.
This paper challenges the binary analysis community to take a long look at the concept of ground truth.
arXiv Detail & Related papers (2022-10-26T23:27:57Z) - Truthful AI: Developing and governing AI that does not lie [0.26385121748044166]
Lying -- the use of verbal falsehoods to deceive -- is harmful.
While lying has traditionally been a human affair, AI systems are becoming increasingly prevalent.
This raises the question of how we should limit the harm caused by AI "lies"
arXiv Detail & Related papers (2021-10-13T12:18:09Z) - Towards Abstract Relational Learning in Human Robot Interaction [73.67226556788498]
Humans have a rich representation of the entities in their environment.
If robots need to interact successfully with humans, they need to represent entities, attributes, and generalizations in a similar way.
In this work, we address the problem of how to obtain these representations through human-robot interaction.
arXiv Detail & Related papers (2020-11-20T12:06:46Z) - Aligning AI With Shared Human Values [85.2824609130584]
We introduce the ETHICS dataset, a new benchmark that spans concepts in justice, well-being, duties, virtues, and commonsense morality.
We find that current language models have a promising but incomplete ability to predict basic human ethical judgements.
Our work shows that progress can be made on machine ethics today, and it provides a steppingstone toward AI that is aligned with human values.
arXiv Detail & Related papers (2020-08-05T17:59:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.