Reinforcement Replaces Supervision: Query focused Summarization using
Deep Reinforcement Learning
- URL: http://arxiv.org/abs/2311.17514v1
- Date: Wed, 29 Nov 2023 10:38:16 GMT
- Title: Reinforcement Replaces Supervision: Query focused Summarization using
Deep Reinforcement Learning
- Authors: Swaroop Nath, Harshad Khadilkar, Pushpak Bhattacharyya
- Abstract summary: We deal with systems that generate summaries from document(s) based on a query.
Motivated by the insight that Reinforcement Learning (RL) provides a generalization to Supervised Learning (SL) for Natural Language Generation, we use an RL-based approach for this task.
We develop multiple Policy Gradient networks, trained on various reward signals: ROUGE, BLEU, and Semantic Similarity.
- Score: 43.123290672073814
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Query-focused Summarization (QfS) deals with systems that generate summaries
from document(s) based on a query. Motivated by the insight that Reinforcement
Learning (RL) provides a generalization to Supervised Learning (SL) for Natural
Language Generation, and thereby performs better (empirically) than SL, we use
an RL-based approach for this task of QfS. Additionally, we also resolve the
conflict of employing RL in Transformers with Teacher Forcing. We develop
multiple Policy Gradient networks, trained on various reward signals: ROUGE,
BLEU, and Semantic Similarity, which lead to a 10-point improvement over the
State-of-the-Art approach on the ROUGE-L metric for a benchmark dataset (ELI5).
We also show performance of our approach in zero-shot setting for another
benchmark dataset (DebatePedia) -- our approach leads to results comparable to
baselines, which were specifically trained on DebatePedia. To aid the RL
training, we propose a better semantic similarity reward, enabled by a novel
Passage Embedding scheme developed using Cluster Hypothesis. Lastly, we
contribute a gold-standard test dataset to further research in QfS and
Long-form Question Answering (LfQA).
Related papers
- CoTKR: Chain-of-Thought Enhanced Knowledge Rewriting for Complex Knowledge Graph Question Answering [33.89497991289916]
We propose a novel rewriting method CoTKR, Chain-of-Thought Enhanced Knowledge Rewriting, for generating reasoning traces and corresponding knowledge in an interleaved manner.
We conduct experiments using various Large Language Models (LLMs) across several Knowledge Graph Question Answering (KGQA) benchmarks.
arXiv Detail & Related papers (2024-09-29T16:08:45Z) - W-RAG: Weakly Supervised Dense Retrieval in RAG for Open-domain Question Answering [28.79851078451609]
Large Language Models (LLMs) often struggle to generate factual answers relying solely on their internal (parametric) knowledge.
To address this limitation, Retrieval-Augmented Generation (RAG) systems enhance LLMs by retrieving relevant information from external sources.
We propose W-RAG by utilizing the ranking capabilities of LLMs to create weakly labeled data for training dense retrievers.
arXiv Detail & Related papers (2024-08-15T22:34:44Z) - Take the Bull by the Horns: Hard Sample-Reweighted Continual Training
Improves LLM Generalization [165.98557106089777]
A key challenge is to enhance the capabilities of large language models (LLMs) amid a looming shortage of high-quality training data.
Our study starts from an empirical strategy for the light continual training of LLMs using their original pre-training data sets.
We then formalize this strategy into a principled framework of Instance-Reweighted Distributionally Robust Optimization.
arXiv Detail & Related papers (2024-02-22T04:10:57Z) - LaGR-SEQ: Language-Guided Reinforcement Learning with Sample-Efficient
Querying [71.86163159193327]
Large language models (LLMs) have recently demonstrated their impressive ability to provide context-aware responses via text.
This ability could potentially be used to predict plausible solutions in sequential decision making tasks pertaining to pattern completion.
We introduce LaGR, which uses this predictive ability of LLMs to propose solutions to tasks that have been partially completed by a primary reinforcement learning (RL) agent.
arXiv Detail & Related papers (2023-08-21T02:07:35Z) - Text Generation with Efficient (Soft) Q-Learning [91.47743595382758]
Reinforcement learning (RL) offers a more flexible solution by allowing users to plug in arbitrary task metrics as reward.
We introduce a new RL formulation for text generation from the soft Q-learning perspective.
We apply the approach to a wide range of tasks, including learning from noisy/negative examples, adversarial attacks, and prompt generation.
arXiv Detail & Related papers (2021-06-14T18:48:40Z) - Information Directed Reward Learning for Reinforcement Learning [64.33774245655401]
We learn a model of the reward function that allows standard RL algorithms to achieve high expected return with as few expert queries as possible.
In contrast to prior active reward learning methods designed for specific types of queries, IDRL naturally accommodates different query types.
We support our findings with extensive evaluations in multiple environments and with different types of queries.
arXiv Detail & Related papers (2021-02-24T18:46:42Z) - SIBRE: Self Improvement Based REwards for Adaptive Feedback in
Reinforcement Learning [5.868852957948178]
We propose a generic reward shaping approach for improving the rate of convergence in reinforcement learning (RL)
The approach is designed for use in conjunction with any existing RL algorithm, and consists of rewarding improvement over the agent's own past performance.
We prove that SIBRE converges in expectation under the same conditions as the original RL algorithm.
arXiv Detail & Related papers (2020-04-21T09:22:16Z) - Analyzing Reinforcement Learning Benchmarks with Random Weight Guessing [2.5137859989323537]
A large number of policy networks are generated by randomly guessing their parameters, and then evaluated on the benchmark task.
We show that this approach isolates the environment complexity, highlights specific types of challenges, and provides a proper foundation for the statistical analysis of the task's difficulty.
We test our approach on a variety of classic control benchmarks from the OpenAI Gym, where we show that small untrained networks can provide a robust baseline for a variety of tasks.
arXiv Detail & Related papers (2020-04-16T15:32:52Z) - Nested-Wasserstein Self-Imitation Learning for Sequence Generation [158.19606942252284]
We propose the concept of nested-Wasserstein distance for distributional semantic matching.
A novel nested-Wasserstein self-imitation learning framework is developed, encouraging the model to exploit historical high-rewarded sequences.
arXiv Detail & Related papers (2020-01-20T02:19:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.