Related papers: Learning and Leveraging Verifiers to Improve Planning Capabilities of Pre-trained Language Models

Learning and Leveraging Verifiers to Improve Planning Capabilities of Pre-trained Language Models

URL: http://arxiv.org/abs/2305.17077v1
Date: Fri, 26 May 2023 16:36:55 GMT
Title: Learning and Leveraging Verifiers to Improve Planning Capabilities of Pre-trained Language Models
Authors: Daman Arora and Subbarao Kambhampati
Abstract summary: We empirically demonstrate that the performance of a finetuned baseline remains poor because it violates pre-conditions of actions in the plans that it generates. To improve the planning capabilities of a finetuned LLM, we train a verifier, which can classify actions as being valid or invalid in a particular state. In the presence of diverse sampling from a generator and a verifier, we show significant gains in the success rate on the Blocksworld domain.
Score: 20.13307800821161
License: http://creativecommons.org/licenses/by/4.0/
Abstract: There have been wide spread claims in the literature about the emergent reasoning capabilities of Pretrained Large Language Models. However, recent studies, have found that their ability to plan remains questionable. Through our experiments using GPT-2, we empirically demonstrate that the performance of a finetuned baseline remains poor because it violates pre-conditions of actions in the plans that it generates. To improve the planning capabilities of a finetuned LLM, we train a verifier, which can classify actions as being valid or invalid in a particular state. By randomly sampling actions from the same dataset, we generate examples of invalid actions which are then used to train a verifier which can check for action applicability. In the presence of diverse sampling from a generator and a verifier which can prune invalid trajectories, we show significant gains in the success rate on the Blocksworld domain. Additionally, we show that finetuning the GPT-2 generator itself to create the verifier generalizes better than finetuning the base GPT-2. Lastly, we investigate the role of the sampling temperature which can be used to control the exploration-exploitation tradeoff.

Related papers

ExPO: Unlocking Hard Reasoning with Self-Explanation-Guided Reinforcement Learning [12.83211408922535]
Reinforcement learning-style post-training improves reasoning by optimizing model outputs based on reward or preference signals.<n> GRPO-style approaches implement this by using self-generated samples labeled by an outcome-based verifier.<n>We propose $textbfSelf-Explanation Policy Optimization (ExPO)$-a simple and modular framework that generates such samples by conditioning on the ground-truth answer.
arXiv Detail & Related papers (2025-07-03T17:44:55Z)
Reliable Few-shot Learning under Dual Noises [166.53173694689693]
We propose DEnoised Task Adaptation (DETA++) for reliable few-shot learning.<n>DETA++ employs a memory bank to store and refine clean regions for each inner-task class, based on which a Local Nearestid (LocalNCC) is devised to yield noise-robust predictions on query samples.<n>Extensive experiments demonstrate the effectiveness and flexibility of DETA++.
arXiv Detail & Related papers (2025-06-19T14:05:57Z)
Random Initialization Can't Catch Up: The Advantage of Language Model Transfer for Time Series Forecasting [12.230245646429324]
Recent works have demonstrated the effectiveness of adapting pre-trained language models (LMs) for forecasting time series in the low-data regime.<n>We build upon these findings by analyzing the effective transfer from language models to time series forecasting under various design choices.
arXiv Detail & Related papers (2025-06-12T18:39:38Z)
Improving Large Language Model Planning with Action Sequence Similarity [50.52049888490524]
In this work, we explore how to improve the model planning capability through in-context learning (ICL)<n>We propose GRASE-DC: a two-stage pipeline that first re-samples high AS exemplars and then curates the selected exemplars.<n>Our experimental result confirms that GRASE-DC achieves significant performance improvement on various planning tasks.
arXiv Detail & Related papers (2025-05-02T05:16:17Z)
TSCAN: Context-Aware Uplift Modeling via Two-Stage Training for Online Merchant Business Diagnosis [2.8438369256032416]
We propose a Context-Aware uplift model based on the Two-Stage training approach (TSCAN) In the first stage, we train an uplift model, called CAN-U, which includes the treatment regularizations of IPM and propensity score prediction. In the second stage, we train a model named CAN-D, which utilizes an isotonic output layer to directly model uplift effects.
arXiv Detail & Related papers (2025-04-26T10:00:16Z)
Can Pre-training Indicators Reliably Predict Fine-tuning Outcomes of LLMs? [32.04523360747506]
We construct a dataset using 50 1B parameter LLM variants with systematically varied pre-training configurations. We introduce novel unsupervised and supervised proxy metrics derived from pre-training that successfully reduce the relative performance prediction error rate by over 50%.
arXiv Detail & Related papers (2025-04-16T21:19:09Z)
The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models [69.798277882245]
We introduce Unsupervised Prefix Fine-Tuning (UPFT) to enhance large language models' reasoning efficiency. UPFT removes the need for labeled data or exhaustive sampling. Experiments show that UPFT matches the performance of supervised methods.
arXiv Detail & Related papers (2025-03-04T18:56:03Z)
Towards Pattern-aware Data Augmentation for Temporal Knowledge Graph Completion [18.51546761241817]
We introduce Booster, the first data augmentation strategy for temporal knowledge graphs. We propose a hierarchical scoring algorithm based on triadic closures within TKGs. We also propose a two-stage training approach to identify samples that deviate from the model's preferred patterns.
arXiv Detail & Related papers (2024-12-31T03:47:19Z)
Controlling Language and Diffusion Models by Transporting Activations [23.352500740697938]
We introduce Activation Transport (AcT), a framework to steer activations guided by optimal transport theory. We experimentally show the effectiveness and versatility of our approach by addressing key challenges in large language models (LLMs) and text-to-image diffusion models (T2Is)
arXiv Detail & Related papers (2024-10-30T14:21:33Z)
Low-rank finetuning for LLMs: A fairness perspective [54.13240282850982]
Low-rank approximation techniques have become the de facto standard for fine-tuning Large Language Models. This paper investigates the effectiveness of these methods in capturing the shift of fine-tuning datasets from the initial pre-trained data distribution. We show that low-rank fine-tuning inadvertently preserves undesirable biases and toxic behaviors.
arXiv Detail & Related papers (2024-05-28T20:43:53Z)
Heuristic-enhanced Candidates Selection strategy for GPTs tackle Few-Shot Aspect-Based Sentiment Analysis [1.5020330976600738]
The paper designs a Heuristic-enhanced Candidates Selection strategy and further proposes All in One (AiO) model based on it. The model works in a two-stage, which simultaneously accommodates the accuracy of PLMs and the capability of generalization. The experimental results demonstrate that the proposed model can better adapt to multiple sub-tasks, and also outperforms the methods that directly utilize GPTs.
arXiv Detail & Related papers (2024-04-09T07:02:14Z)
Sparse is Enough in Fine-tuning Pre-trained Large Language Models [98.46493578509039]
We propose a gradient-based sparse fine-tuning algorithm, named Sparse Increment Fine-Tuning (SIFT) We validate its effectiveness on a range of tasks including the GLUE Benchmark and Instruction-tuning.
arXiv Detail & Related papers (2023-12-19T06:06:30Z)
Generalized Logit Adjustment: Calibrating Fine-tuned Models by Removing Label Bias in Foundation Models [75.9543301303586]
Foundation models like CLIP allow zero-shot transfer on various tasks without additional training data. Fine-tuning and ensembling are also commonly adopted to better fit the downstream tasks. However, we argue that prior work has overlooked the inherent biases in foundation models.
arXiv Detail & Related papers (2023-10-12T08:01:11Z)
An Empirical Analysis of Parameter-Efficient Methods for Debiasing Pre-Trained Language Models [55.14405248920852]
We conduct experiments with prefix tuning, prompt tuning, and adapter tuning on different language models and bias types to evaluate their debiasing performance. We find that the parameter-efficient methods are effective in mitigating gender bias, where adapter tuning is consistently the most effective. We also find that prompt tuning is more suitable for GPT-2 than BERT, and racial and religious bias is less effective when it comes to racial and religious bias.
arXiv Detail & Related papers (2023-06-06T23:56:18Z)
Sample-Efficient Optimisation with Probabilistic Transformer Surrogates [66.98962321504085]
This paper investigates the feasibility of employing state-of-the-art probabilistic transformers in Bayesian optimisation. We observe two drawbacks stemming from their training procedure and loss definition, hindering their direct deployment as proxies in black-box optimisation. We introduce two components: 1) a BO-tailored training prior supporting non-uniformly distributed points, and 2) a novel approximate posterior regulariser trading-off accuracy and input sensitivity to filter favourable stationary points for improved predictive performance.
arXiv Detail & Related papers (2022-05-27T11:13:17Z)
A Generative Language Model for Few-shot Aspect-Based Sentiment Analysis [90.24921443175514]
We focus on aspect-based sentiment analysis, which involves extracting aspect term, category, and predicting their corresponding polarities. We propose to reformulate the extraction and prediction tasks into the sequence generation task, using a generative language model with unidirectional attention. Our approach outperforms the previous state-of-the-art (based on BERT) on average performance by a large margins in few-shot and full-shot settings.
arXiv Detail & Related papers (2022-04-11T18:31:53Z)
Learning to Adapt to Unseen Abnormal Activities under Weak Supervision [43.40900198498228]
We present a meta-learning framework for weakly supervised anomaly detection in videos. Our framework learns to adapt to unseen types of abnormal activities effectively when only video-level annotations of binary labels are available.
arXiv Detail & Related papers (2022-03-25T12:15:44Z)
EFSG: Evolutionary Fooling Sentences Generator [5.763228702181544]
Evolutionary Fooling Sentences Generator (EFSG) is a model- and task-agnostic adversarial attack algorithm built using an evolutionary approach. We apply EFSG to CoLA and MRPC tasks, on BERT and RoBERTa, comparing performances. We obtain stronger improved models with no loss of accuracy when tested on the original datasets.
arXiv Detail & Related papers (2020-10-12T14:28:48Z)
Unsupervised Paraphrase Generation using Pre-trained Language Models [0.0]
OpenAI's GPT-2 is notable for its capability to generate fluent, well formulated, grammatically consistent text. We leverage this generation capability of GPT-2 to generate paraphrases without any supervision from labelled data. Our experiments show that paraphrases generated with our model are of good quality, are diverse and improves the downstream task performance when used for data augmentation.
arXiv Detail & Related papers (2020-06-09T19:40:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.