SciFix: Outperforming GPT3 on Scientific Factual Error Correction
- URL: http://arxiv.org/abs/2305.14707v2
- Date: Thu, 12 Oct 2023 21:13:09 GMT
- Title: SciFix: Outperforming GPT3 on Scientific Factual Error Correction
- Authors: Dhananjay Ashok, Atharva Kulkarni, Hai Pham, Barnab\'as P\'oczos
- Abstract summary: SciFix is a scientific claim correction system that does not require a verifier but can outperform existing methods by a considerable margin.
Our method leverages the power of prompting with LLMs during training to create a richly annotated dataset.
- Score: 9.850216012914684
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Due to the prohibitively high cost of creating error correction datasets,
most Factual Claim Correction methods rely on a powerful verification model to
guide the correction process. This leads to a significant drop in performance
in domains like scientific claims, where good verification models do not always
exist. In this work, we introduce SciFix, a scientific claim correction system
that does not require a verifier but can outperform existing methods by a
considerable margin -- achieving correction accuracy of 84% on the SciFact
dataset, 77% on SciFact-Open and 72% on the CovidFact dataset, compared to next
best accuracies of 7%, 5%, and 15% on the same datasets respectively. Our
method leverages the power of prompting with LLMs during training to create a
richly annotated dataset that can be used for fully supervised training and
regularization. We additionally use a claim-aware decoding procedure to improve
the quality of corrected claims. Our method outperforms the very LLM that was
used to generate the annotated dataset -- with Few-Shot Prompting on GPT3.5
achieving 58%, 61%, and 64% on the respective datasets, a consistently lower
correction accuracy, despite using nearly 800 times as many parameters as our
model.
Related papers
- Fill In The Gaps: Model Calibration and Generalization with Synthetic Data [2.89287673224661]
We propose a calibration method that incorporates synthetic data without compromising accuracy.
We derive the expected calibration error (ECE) bound using the Probably Approximately Correct (PAC) learning framework.
We observed an average up to 34% increase in accuracy and 33% decrease in ECE.
arXiv Detail & Related papers (2024-10-07T23:06:42Z) - Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs [54.05511925104712]
We propose a simple, effective, and data-efficient method called Step-DPO.
Step-DPO treats individual reasoning steps as units for preference optimization rather than evaluating answers holistically.
Our findings demonstrate that as few as 10K preference data pairs and fewer than 500 Step-DPO training steps can yield a nearly 3% gain in accuracy on MATH for models with over 70B parameters.
arXiv Detail & Related papers (2024-06-26T17:43:06Z) - FedCal: Achieving Local and Global Calibration in Federated Learning via Aggregated Parameterized Scaler [29.93307421620845]
Federated learning (FedCal) uses client-specific scalers for local and global calibration.
Experiments demonstrate FedCal significantly outperforms the best-performing baseline, reducing global calibration error by 47.66% on average.
arXiv Detail & Related papers (2024-05-24T11:33:58Z) - Reformatted Alignment [27.79684742862816]
Current methods to improve data quality are either labor-intensive or prone to factual errors caused by hallucinations.
This paper introduces a simple and effective approach named ReAlign, which reformats the responses of instruction data into a format that better aligns with pre-established criteria and the collated evidence.
Experimentally, ReAlign significantly boosts the general alignment ability, math reasoning, factuality, and readability of the LLMs.
arXiv Detail & Related papers (2024-02-19T15:21:58Z) - Parameter-tuning-free data entry error unlearning with adaptive
selective synaptic dampening [51.34904967046097]
We introduce an extension to the selective synaptic dampening unlearning method that removes the need for parameter tuning.
We demonstrate the performance of this extension, adaptive selective synaptic dampening (ASSD) on various ResNet18 and Vision Transformer unlearning tasks.
The application of this approach is particularly compelling in industrial settings, such as supply chain management.
arXiv Detail & Related papers (2024-02-06T14:04:31Z) - FAIRLABEL: Correcting Bias in Labels [2.810160553339817]
We propose FAIRLABEL, an algorithm which detects and corrects biases in labels.
The goal of FAIRLABELis to reduce the Disparate Impact (DI) across groups while maintaining high accuracy in predictions.
arXiv Detail & Related papers (2023-11-01T16:38:27Z) - Machine Learning Force Fields with Data Cost Aware Training [94.78998399180519]
Machine learning force fields (MLFF) have been proposed to accelerate molecular dynamics (MD) simulation.
Even for the most data-efficient MLFFs, reaching chemical accuracy can require hundreds of frames of force and energy labels.
We propose a multi-stage computational framework -- ASTEROID, which lowers the data cost of MLFFs by leveraging a combination of cheap inaccurate data and expensive accurate data.
arXiv Detail & Related papers (2023-06-05T04:34:54Z) - Boosting Facial Expression Recognition by A Semi-Supervised Progressive
Teacher [54.50747989860957]
We propose a semi-supervised learning algorithm named Progressive Teacher (PT) to utilize reliable FER datasets as well as large-scale unlabeled expression images for effective training.
Experiments on widely-used databases RAF-DB and FERPlus validate the effectiveness of our method, which achieves state-of-the-art performance with accuracy of 89.57% on RAF-DB.
arXiv Detail & Related papers (2022-05-28T07:47:53Z) - Data-Free Quantization with Accurate Activation Clipping and Adaptive
Batch Normalization [4.329951775163721]
We present a data-free quantization method with accurate activation clipping and adaptive batch normalization.
Experiments demonstrate that the proposed data-free quantization method can yield surprisingly performance, achieving 64.33% top-1 accuracy of ResNet18 on ImageNet dataset.
arXiv Detail & Related papers (2022-04-08T01:56:51Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Evaluating Prediction-Time Batch Normalization for Robustness under
Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift.
We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness.
The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.