GROOT: Corrective Reward Optimization for Generative Sequential Labeling
- URL: http://arxiv.org/abs/2209.14694v1
- Date: Thu, 29 Sep 2022 11:35:47 GMT
- Title: GROOT: Corrective Reward Optimization for Generative Sequential Labeling
- Authors: Kazuma Hashimoto and Karthik Raman
- Abstract summary: We propose GROOT -- a framework for Generative Reward Optimization Of Text sequences.
GROOT works by training a generative sequential labeling model to match the decoder output distribution with that of the (black-box) reward function.
As demonstrated via extensive experiments on four public benchmarks, GROOT significantly improves all reward metrics.
- Score: 10.306943706927004
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sequential labeling is a fundamental NLP task, forming the backbone of many
applications. Supervised learning of Seq2Seq models (like T5) has shown great
success on these problems. However there remains a significant disconnect
between the training objectives of these models vs the metrics and desiderata
we care about in practical applications. For example, a practical sequence
tagging application may want to optimize for a certain precision-recall
trade-off (of the top-k predictions) which is quite different from the standard
objective of maximizing the likelihood of the gold labeled sequence. Thus to
bridge this gap, we propose GROOT -- a simple yet effective framework for
Generative Reward Optimization Of Text sequences. GROOT works by training a
generative sequential labeling model to match the decoder output distribution
with that of the (black-box) reward function. Using an iterative training
regime, we first generate prediction candidates, then correct errors in them,
and finally contrast those candidates (based on their reward values). As
demonstrated via extensive experiments on four public benchmarks, GROOT
significantly improves all reward metrics. Furthermore, GROOT also leads to
improvements of the overall decoder distribution as evidenced by the quality
gains of the top-$k$ candidates.
Related papers
- Multi-head Sequence Tagging Model for Grammatical Error Correction [31.538895931875565]
Grammatical Error Correction (GEC) problem is a mapping between a source sequence and a target one.
Current sequence tagging approaches still have some issues handling a broad range of grammatical errors just by being laser-focused on one task.
We propose a novel multi-head and multi-task learning model to effectively utilize training data and harness the information from related task training signals.
arXiv Detail & Related papers (2024-10-21T20:01:06Z) - Aligning GPTRec with Beyond-Accuracy Goals with Reinforcement Learning [67.71952251641545]
GPTRec is an alternative to the Top-K model for item-by-item recommendations.
We show that GPTRec offers a better tradeoff between accuracy and secondary metrics than classic greedy re-ranking techniques.
Our experiments on two datasets show that GPTRec's Next-K generation approach offers a better tradeoff between accuracy and secondary metrics than classic greedy re-ranking techniques.
arXiv Detail & Related papers (2024-03-07T19:47:48Z) - Ranking-based Adaptive Query Generation for DETRs in Crowded Pedestrian
Detection [49.27380156754935]
We find that the number of DETRs' queries must be adjusted manually, otherwise, the performance would degrade to varying degrees.
We propose Rank-based Adaptive Query Generation (RAQG) to alleviate the problem.
Our method is simple and effective, which can be plugged into any DETRs to make it query-adaptive in theory.
arXiv Detail & Related papers (2023-10-24T11:00:56Z) - Rethinking Model Selection and Decoding for Keyphrase Generation with
Pre-trained Sequence-to-Sequence Models [76.52997424694767]
Keyphrase Generation (KPG) is a longstanding task in NLP with widespread applications.
Seq2seq pre-trained language models (PLMs) have ushered in a transformative era for KPG, yielding promising performance improvements.
This paper undertakes a systematic analysis of the influence of model selection and decoding strategies on PLM-based KPG.
arXiv Detail & Related papers (2023-10-10T07:34:45Z) - AdaNPC: Exploring Non-Parametric Classifier for Test-Time Adaptation [64.9230895853942]
Domain generalization can be arbitrarily hard without exploiting target domain information.
Test-time adaptive (TTA) methods are proposed to address this issue.
In this work, we adopt Non-Parametric to perform the test-time Adaptation (AdaNPC)
arXiv Detail & Related papers (2023-04-25T04:23:13Z) - Heuristic Semi-Supervised Learning for Graph Generation Inspired by
Electoral College [80.67842220664231]
We propose a novel pre-processing technique, namely ELectoral COllege (ELCO), which automatically expands new nodes and edges to refine the label similarity within a dense subgraph.
In all setups tested, our method boosts the average score of base models by a large margin of 4.7 points, as well as consistently outperforms the state-of-the-art.
arXiv Detail & Related papers (2020-06-10T14:48:48Z) - Document Ranking with a Pretrained Sequence-to-Sequence Model [56.44269917346376]
We show how a sequence-to-sequence model can be trained to generate relevance labels as "target words"
Our approach significantly outperforms an encoder-only model in a data-poor regime.
arXiv Detail & Related papers (2020-03-14T22:29:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.