Related papers: GROOT: Corrective Reward Optimization for Generative Sequential Labeling

GROOT: Corrective Reward Optimization for Generative Sequential Labeling

URL: http://arxiv.org/abs/2209.14694v1
Date: Thu, 29 Sep 2022 11:35:47 GMT
Title: GROOT: Corrective Reward Optimization for Generative Sequential Labeling
Authors: Kazuma Hashimoto and Karthik Raman
Abstract summary: We propose GROOT -- a framework for Generative Reward Optimization Of Text sequences. GROOT works by training a generative sequential labeling model to match the decoder output distribution with that of the (black-box) reward function. As demonstrated via extensive experiments on four public benchmarks, GROOT significantly improves all reward metrics.
Score: 10.306943706927004
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Sequential labeling is a fundamental NLP task, forming the backbone of many applications. Supervised learning of Seq2Seq models (like T5) has shown great success on these problems. However there remains a significant disconnect between the training objectives of these models vs the metrics and desiderata we care about in practical applications. For example, a practical sequence tagging application may want to optimize for a certain precision-recall trade-off (of the top-k predictions) which is quite different from the standard objective of maximizing the likelihood of the gold labeled sequence. Thus to bridge this gap, we propose GROOT -- a simple yet effective framework for Generative Reward Optimization Of Text sequences. GROOT works by training a generative sequential labeling model to match the decoder output distribution with that of the (black-box) reward function. Using an iterative training regime, we first generate prediction candidates, then correct errors in them, and finally contrast those candidates (based on their reward values). As demonstrated via extensive experiments on four public benchmarks, GROOT significantly improves all reward metrics. Furthermore, GROOT also leads to improvements of the overall decoder distribution as evidenced by the quality gains of the top-$k$ candidates.

Related papers

Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE [15.003006630308517]
Speculative decoding (SD) accelerates large language model inference by using a smaller draft model to predict multiple tokens. We propose Jakiro, leveraging Mixture of Experts (MoE), where independent experts generate diverse predictions. Our method significantly boosts prediction accuracy and achieves higher inference speedups.
arXiv Detail & Related papers (2025-02-10T09:24:06Z)
Towards Cost-Effective Reward Guided Text Generation [27.11836864643437]
Reward-guided text generation (RGTG) has emerged as a viable alternative to offline reinforcement learning from human feedback. We present a novel reward model architecture that is trained, using a Bradley-Terry loss, to prefer the optimal expansion of a sequence with just a emphsingle call to the reward model at each step of the generation process.
arXiv Detail & Related papers (2025-02-06T21:36:44Z)
GROOT: Effective Design of Biological Sequences with Limited Experimental Data [13.2932577265247]
We introduce GROOT, a Graph-based Latent Smoothing for Biological Sequence Optimization. We evaluate GROOT on various biological sequence design tasks, including protein optimization (GFP and AAV) and three tasks with exact oracles from Design-Bench. The results demonstrate that GROOT equalizes and surpasses existing methods without requiring access to black-box oracles or vast amounts of labeled data.
arXiv Detail & Related papers (2024-11-18T03:38:42Z)
Multi-head Sequence Tagging Model for Grammatical Error Correction [31.538895931875565]
Grammatical Error Correction (GEC) problem is a mapping between a source sequence and a target one. Current sequence tagging approaches still have some issues handling a broad range of grammatical errors just by being laser-focused on one task. We propose a novel multi-head and multi-task learning model to effectively utilize training data and harness the information from related task training signals.
arXiv Detail & Related papers (2024-10-21T20:01:06Z)
Aligning GPTRec with Beyond-Accuracy Goals with Reinforcement Learning [67.71952251641545]
GPTRec is an alternative to the Top-K model for item-by-item recommendations. We show that GPTRec offers a better tradeoff between accuracy and secondary metrics than classic greedy re-ranking techniques. Our experiments on two datasets show that GPTRec's Next-K generation approach offers a better tradeoff between accuracy and secondary metrics than classic greedy re-ranking techniques.
arXiv Detail & Related papers (2024-03-07T19:47:48Z)
Ranking-based Adaptive Query Generation for DETRs in Crowded Pedestrian Detection [49.27380156754935]
We find that the number of DETRs' queries must be adjusted manually, otherwise, the performance would degrade to varying degrees. We propose Rank-based Adaptive Query Generation (RAQG) to alleviate the problem. Our method is simple and effective, which can be plugged into any DETRs to make it query-adaptive in theory.
arXiv Detail & Related papers (2023-10-24T11:00:56Z)
Rethinking Model Selection and Decoding for Keyphrase Generation with Pre-trained Sequence-to-Sequence Models [76.52997424694767]
Keyphrase Generation (KPG) is a longstanding task in NLP with widespread applications. Seq2seq pre-trained language models (PLMs) have ushered in a transformative era for KPG, yielding promising performance improvements. This paper undertakes a systematic analysis of the influence of model selection and decoding strategies on PLM-based KPG.
arXiv Detail & Related papers (2023-10-10T07:34:45Z)
AdaNPC: Exploring Non-Parametric Classifier for Test-Time Adaptation [64.9230895853942]
Domain generalization can be arbitrarily hard without exploiting target domain information. Test-time adaptive (TTA) methods are proposed to address this issue. In this work, we adopt Non-Parametric to perform the test-time Adaptation (AdaNPC)
arXiv Detail & Related papers (2023-04-25T04:23:13Z)
Heuristic Semi-Supervised Learning for Graph Generation Inspired by Electoral College [80.67842220664231]
We propose a novel pre-processing technique, namely ELectoral COllege (ELCO), which automatically expands new nodes and edges to refine the label similarity within a dense subgraph. In all setups tested, our method boosts the average score of base models by a large margin of 4.7 points, as well as consistently outperforms the state-of-the-art.
arXiv Detail & Related papers (2020-06-10T14:48:48Z)
Document Ranking with a Pretrained Sequence-to-Sequence Model [56.44269917346376]
We show how a sequence-to-sequence model can be trained to generate relevance labels as "target words" Our approach significantly outperforms an encoder-only model in a data-poor regime.
arXiv Detail & Related papers (2020-03-14T22:29:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.