Benchmarks for Detecting Measurement Tampering
- URL: http://arxiv.org/abs/2308.15605v5
- Date: Fri, 29 Sep 2023 15:53:36 GMT
- Title: Benchmarks for Detecting Measurement Tampering
- Authors: Fabien Roger, Ryan Greenblatt, Max Nadeau, Buck Shlegeris, Nate Thomas
- Abstract summary: We build four new text-based datasets to evaluate measurement tampering detection techniques on large language models.
The goal is to determine if examples where all measurements indicate the outcome occurred actually had the outcome occur, or if this was caused by measurement tampering.
We demonstrate techniques that outperform simple baselines on most datasets, but don't achieve maximum performance.
- Score: 2.9138729302304855
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: When training powerful AI systems to perform complex tasks, it may be
challenging to provide training signals which are robust to optimization. One
concern is \textit{measurement tampering}, where the AI system manipulates
multiple measurements to create the illusion of good results instead of
achieving the desired outcome. In this work, we build four new text-based
datasets to evaluate measurement tampering detection techniques on large
language models. Concretely, given sets of text inputs and measurements aimed
at determining if some outcome occurred, as well as a base model able to
accurately predict measurements, the goal is to determine if examples where all
measurements indicate the outcome occurred actually had the outcome occur, or
if this was caused by measurement tampering. We demonstrate techniques that
outperform simple baselines on most datasets, but don't achieve maximum
performance. We believe there is significant room for improvement for both
techniques and datasets, and we are excited for future work tackling
measurement tampering.
Related papers
- How Hard is this Test Set? NLI Characterization by Exploiting Training Dynamics [49.9329723199239]
We propose a method for the automated creation of a challenging test set without relying on the manual construction of artificial and unrealistic examples.
We categorize the test set of popular NLI datasets into three difficulty levels by leveraging methods that exploit training dynamics.
When our characterization method is applied to the training set, models trained with only a fraction of the data achieve comparable performance to those trained on the full dataset.
arXiv Detail & Related papers (2024-10-04T13:39:21Z) - Machine Translation Meta Evaluation through Translation Accuracy
Challenge Sets [92.38654521870444]
We introduce ACES, a contrastive challenge set spanning 146 language pairs.
This dataset aims to discover whether metrics can identify 68 translation accuracy errors.
We conduct a large-scale study by benchmarking ACES on 50 metrics submitted to the WMT 2022 and 2023 metrics shared tasks.
arXiv Detail & Related papers (2024-01-29T17:17:42Z) - Randomized compiling for subsystem measurements [0.0]
We introduce a new technique based on randomized compiling to transform errors in measurements into a simple form that removes particularly harmful effects.
We show that our technique reduces generic errors in a computational basis measurement to act like a confusion matrix.
We demonstrate that a simple and realistic noise model can cause errors that are harmful and difficult to model.
arXiv Detail & Related papers (2023-04-13T15:06:11Z) - On the impact of dataset size and class imbalance in evaluating
machine-learning-based windows malware detection techniques [0.0]
Some researchers use smaller datasets, and if dataset size has a significant impact on performance, that makes comparison of the published results difficult.
The project identified two key objectives, to understand if dataset size correlates to measured detector performance to an extent that prevents meaningful comparison of published results.
Results suggested that high accuracy scores don't necessarily translate to high real-world performance.
arXiv Detail & Related papers (2022-06-13T15:37:31Z) - Evaluating BERT-based Pre-training Language Models for Detecting
Misinformation [2.1915057426589746]
It is challenging to control the quality of online information due to the lack of supervision over all the information posted online.
There is a need for automated rumour detection techniques to limit the adverse effects of spreading misinformation.
This study proposes the BERT-based pre-trained language models to encode text data into vectors and utilise neural network models to classify these vectors to detect misinformation.
arXiv Detail & Related papers (2022-03-15T08:54:36Z) - Data-Centric Machine Learning in the Legal Domain [0.2624902795082451]
This paper explores how changes in a data set influence the measured performance of a model.
Using three publicly available data sets from the legal domain, we investigate how changes to their size, the train/test splits, and the human labelling accuracy impact the performance.
The observed effects are surprisingly pronounced, especially when the per-class performance is considered.
arXiv Detail & Related papers (2022-01-17T23:05:14Z) - Robust Event Classification Using Imperfect Real-world PMU Data [58.26737360525643]
We study robust event classification using imperfect real-world phasor measurement unit (PMU) data.
We develop a novel machine learning framework for training robust event classifiers.
arXiv Detail & Related papers (2021-10-19T17:41:43Z) - False perfection in machine prediction: Detecting and assessing
circularity problems in machine learning [11.878820609988695]
We demonstrate a problem of machine learning in vital application areas such as medical informatics or patent law.
The inclusion of measurements on which target outputs are deterministically defined in the representations of input data leads to perfect, but circular predictions.
We argue that a transfer of research results to real-world applications requires to avoid circularity by separating measurements that define target outcomes from data representations.
arXiv Detail & Related papers (2021-06-23T14:11:06Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z) - Learning a Unified Sample Weighting Network for Object Detection [113.98404690619982]
Region sampling or weighting is significantly important to the success of modern region-based object detectors.
We argue that sample weighting should be data-dependent and task-dependent.
We propose a unified sample weighting network to predict a sample's task weights.
arXiv Detail & Related papers (2020-06-11T16:19:16Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.