Model-Based Simulation for Optimising Smart Reply
- URL: http://arxiv.org/abs/2305.16852v1
- Date: Fri, 26 May 2023 12:04:33 GMT
- Title: Model-Based Simulation for Optimising Smart Reply
- Authors: Benjamin Towle and Ke Zhou
- Abstract summary: Smart Reply (SR) systems present a user with a set of replies, of which one can be selected in place of having to type out a response.
Previous work has focused largely on post-hoc diversification, rather than explicitly learning to predict sets of responses.
We present a novel method SimSR, that employs model-based simulation to discover high-value response sets.
- Score: 3.615981646205045
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Smart Reply (SR) systems present a user with a set of replies, of which one
can be selected in place of having to type out a response. To perform well at
this task, a system should be able to effectively present the user with a
diverse set of options, to maximise the chance that at least one of them
conveys the user's desired response. This is a significant challenge, due to
the lack of datasets containing sets of responses to learn from. Resultantly,
previous work has focused largely on post-hoc diversification, rather than
explicitly learning to predict sets of responses. Motivated by this problem, we
present a novel method SimSR, that employs model-based simulation to discover
high-value response sets, through simulating possible user responses with a
learned world model. Unlike previous approaches, this allows our method to
directly optimise the end-goal of SR--maximising the relevance of at least one
of the predicted replies. Empirically on two public datasets, when compared to
SoTA baselines, our method achieves up to 21% and 18% improvement in ROUGE
score and Self-ROUGE score respectively.
Related papers
- Learning to Rank for Multiple Retrieval-Augmented Models through Iterative Utility Maximization [21.115495457454365]
This paper investigates the design of a unified search engine to serve multiple retrieval-augmented generation (RAG) agents.
We introduce an iterative approach where the search engine generates retrieval results for these RAG agents and gathers feedback on the quality of the retrieved documents during an offline phase.
We adapt this approach to an online setting, allowing the search engine to refine its behavior based on real-time individual agents feedback.
arXiv Detail & Related papers (2024-10-13T17:53:50Z) - Selective Self-Rehearsal: A Fine-Tuning Approach to Improve Generalization in Large Language Models [19.752712857873043]
This paper introduces Selective Self-Rehearsal (SSR), a fine-tuning approach that achieves performance comparable to the standard supervised fine-tuning (SFT)
By utilizing the model's correct responses, SSR reduces model specialization during the fine-tuning stage.
The effectiveness of SSR is demonstrated through experiments on the task of identifying unanswerable queries across various datasets.
arXiv Detail & Related papers (2024-09-07T10:21:03Z) - An incremental preference elicitation-based approach to learning potentially non-monotonic preferences in multi-criteria sorting [53.36437745983783]
We first construct a max-margin optimization-based model to model potentially non-monotonic preferences.
We devise information amount measurement methods and question selection strategies to pinpoint the most informative alternative in each iteration.
Two incremental preference elicitation-based algorithms are developed to learn potentially non-monotonic preferences.
arXiv Detail & Related papers (2024-09-04T14:36:20Z) - Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity [59.57065228857247]
Retrieval-augmented Large Language Models (LLMs) have emerged as a promising approach to enhancing response accuracy in several tasks, such as Question-Answering (QA)
We propose a novel adaptive QA framework, that can dynamically select the most suitable strategy for (retrieval-augmented) LLMs based on the query complexity.
We validate our model on a set of open-domain QA datasets, covering multiple query complexities, and show that ours enhances the overall efficiency and accuracy of QA systems.
arXiv Detail & Related papers (2024-03-21T13:52:30Z) - Aligning Large Language Models by On-Policy Self-Judgment [49.31895979525054]
Existing approaches for aligning large language models with human preferences face a trade-off that requires a separate reward model (RM) for on-policy learning.
We present a novel alignment framework, SELF-JUDGE, that does on-policy learning and is parameter efficient.
We show that the rejecting sampling by itself can improve performance further without an additional evaluator.
arXiv Detail & Related papers (2024-02-17T11:25:26Z) - End-to-End Autoregressive Retrieval via Bootstrapping for Smart Reply
Systems [7.2949782290577945]
We consider a novel approach that learns the smart reply task end-to-end from a dataset of (message, reply set) pairs obtained via bootstrapping.
Empirical results show this method consistently outperforms a range of state-of-the-art baselines across three datasets.
arXiv Detail & Related papers (2023-10-29T09:56:17Z) - Query-Dependent Prompt Evaluation and Optimization with Offline Inverse
RL [62.824464372594576]
We aim to enhance arithmetic reasoning ability of Large Language Models (LLMs) through zero-shot prompt optimization.
We identify a previously overlooked objective of query dependency in such optimization.
We introduce Prompt-OIRL, which harnesses offline inverse reinforcement learning to draw insights from offline prompting demonstration data.
arXiv Detail & Related papers (2023-09-13T01:12:52Z) - Unlocking the Potential of User Feedback: Leveraging Large Language
Model as User Simulator to Enhance Dialogue System [65.93577256431125]
We propose an alternative approach called User-Guided Response Optimization (UGRO) to combine it with a smaller task-oriented dialogue model.
This approach uses LLM as annotation-free user simulator to assess dialogue responses, combining them with smaller fine-tuned end-to-end TOD models.
Our approach outperforms previous state-of-the-art (SOTA) results.
arXiv Detail & Related papers (2023-06-16T13:04:56Z) - Improving Hyperparameter Optimization by Planning Ahead [3.8673630752805432]
We propose a novel transfer learning approach, defined within the context of model-based reinforcement learning.
We propose a new variant of model predictive control which employs a simple look-ahead strategy as a policy.
Our experiments on three meta-datasets comparing to state-of-the-art HPO algorithms show that the proposed method can outperform all baselines.
arXiv Detail & Related papers (2021-10-15T11:46:14Z) - Learning an Effective Context-Response Matching Model with
Self-Supervised Tasks for Retrieval-based Dialogues [88.73739515457116]
We introduce four self-supervised tasks including next session prediction, utterance restoration, incoherence detection and consistency discrimination.
We jointly train the PLM-based response selection model with these auxiliary tasks in a multi-task manner.
Experiment results indicate that the proposed auxiliary self-supervised tasks bring significant improvement for multi-turn response selection.
arXiv Detail & Related papers (2020-09-14T08:44:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.