The Inconvenient Truths of Ground Truth for Binary Analysis
- URL: http://arxiv.org/abs/2210.15079v1
- Date: Wed, 26 Oct 2022 23:27:57 GMT
- Title: The Inconvenient Truths of Ground Truth for Binary Analysis
- Authors: Jim Alves-Foss, Varsah Venugopal
- Abstract summary: We show that not all ground truths are created equal.
This paper challenges the binary analysis community to take a long look at the concept of ground truth.
- Score: 3.198144010381572
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The effectiveness of binary analysis tools and techniques is often measured
with respect to how well they map to a ground truth. We have found that not all
ground truths are created equal. This paper challenges the binary analysis
community to take a long look at the concept of ground truth, to ensure that we
are in agreement with definition(s) of ground truth, so that we can be
confident in the evaluation of tools and techniques. This becomes even more
important as we move to trained machine learning models, which are only as
useful as the validity of the ground truth in the training.
Related papers
- Explaining Necessary Truths [0.0]
We present a framework, based in computational complexity, where explanations for deductive truths co-emerge with discoveries of simplifying steps during the search process.
We simulate human subjects, using GPT-4o, presented with SAT puzzles of varying complexity and reasonableness, validating our theory and showing how its predictions can be tested in future human studies.
arXiv Detail & Related papers (2025-02-16T20:11:39Z) - LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feedback [71.95402654982095]
We propose Math-Minos, a natural language feedback-enhanced verifier.
Our experiments reveal that a small set of natural language feedback can significantly boost the performance of the verifier.
arXiv Detail & Related papers (2024-06-20T06:42:27Z) - Robust NAS under adversarial training: benchmark, theory, and beyond [55.51199265630444]
We release a comprehensive data set that encompasses both clean accuracy and robust accuracy for a vast array of adversarially trained networks.
We also establish a generalization theory for searching architecture in terms of clean accuracy and robust accuracy under multi-objective adversarial training.
arXiv Detail & Related papers (2024-03-19T20:10:23Z) - GRATH: Gradual Self-Truthifying for Large Language Models [63.502835648056305]
GRAdual self-truTHifying (GRATH) is a novel post-processing method to enhance truthfulness of large language models (LLMs)
GRATH iteratively refines truthfulness data and updates the model, leading to a gradual improvement in model truthfulness in a self-supervised manner.
GRATH achieves state-of-the-art performance on TruthfulQA, with MC1 accuracy of 54.71% and MC2 accuracy of 69.10%, which even surpass those on 70B-LLMs.
arXiv Detail & Related papers (2024-01-22T19:00:08Z) - A Glitch in the Matrix? Locating and Detecting Language Model Grounding with Fakepedia [57.31074448586854]
Large language models (LLMs) have an impressive ability to draw on novel information supplied in their context.
Yet the mechanisms underlying this contextual grounding remain unknown.
We present a novel method to study grounding abilities using Fakepedia.
arXiv Detail & Related papers (2023-12-04T17:35:42Z) - Personas as a Way to Model Truthfulness in Language Models [23.86655844340011]
Large language models (LLMs) are trained on vast amounts of text from the internet.
This paper presents an explanation for why LMs appear to know the truth despite not being trained with truth labels.
arXiv Detail & Related papers (2023-10-27T14:27:43Z) - The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets [6.732432949368421]
Large Language Models (LLMs) have impressive capabilities, but are prone to outputting falsehoods.
Recent work has developed techniques for inferring whether a LLM is telling the truth by training probes on the LLM's internal activations.
We present evidence that at sufficient scale, LLMs linearly represent the truth or falsehood of factual statements.
arXiv Detail & Related papers (2023-10-10T17:54:39Z) - Physics of Language Models: Part 3.2, Knowledge Manipulation [51.68385617116854]
This paper investigates four fundamental knowledge manipulation tasks.
We show that language models excel in knowledge retrieval but struggle even in the simplest classification or comparison tasks.
Our findings also apply to modern pretrained language models such as GPT-4.
arXiv Detail & Related papers (2023-09-25T17:50:41Z) - Truth Machines: Synthesizing Veracity in AI Language Models [0.0]
We discuss the struggle for truth in AI systems and the general responses to date.
It then investigates the production of truth in InstructGPT, a large language model.
We argue that these same logics and inconsistencies play out in ChatGPT, reiterating truth as a non-trivial problem.
arXiv Detail & Related papers (2023-01-28T02:47:50Z) - Probing Factually Grounded Content Transfer with Factual Ablation [68.78413677690321]
Grounded generation draws on a reliable external document (grounding) for factual information.
Measuring factuality is also simplified--to factual consistency, testing whether the generation agrees with the grounding, rather than all facts.
We study this problem for content transfer, in which generations extend a prompt, using information from factual grounding.
arXiv Detail & Related papers (2022-03-18T19:18:54Z) - Saliency for free: Saliency prediction as a side-effect of object
recognition [4.609056834401648]
We show that saliency maps can be generated as a side-effect of training an object recognition deep neural network.
Such a network does not require any ground-truth saliency maps for training.
Extensive experiments carried out on both real and synthetic saliency datasets demonstrate that our approach is able to generate accurate saliency maps.
arXiv Detail & Related papers (2021-07-20T17:17:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.