DeepThink: Aligning Language Models with Domain-Specific User Intents
- URL: http://arxiv.org/abs/2502.05497v2
- Date: Thu, 13 Feb 2025 13:22:40 GMT
- Title: DeepThink: Aligning Language Models with Domain-Specific User Intents
- Authors: Yang Li, Mingxuan Luo, Yeyun Gong, Chen Lin, Jian Jiao, Yi Liu, Kaili Huang,
- Abstract summary: This study proposes a novel framework called DeepThink to generate high-quality instructions.
DeepThink first generates a few seed questions to mimic actual user questions, simulates conversations to uncover the hidden user needs, and refines the answer by conversational contexts.
Experiments demonstrate that DeepThink achieves an average performance improvement of 7.92% compared to a GPT-4-turbo+RAG-based assistant on the real user test set in the advertising domain.
- Score: 25.503609067258278
- License:
- Abstract: Supervised fine-tuning with synthesized instructions has been a common practice for adapting LLMs to domain-specific QA tasks. However, the synthesized instructions deviate from real user questions and expected answers. This study proposes a novel framework called DeepThink to generate high-quality instructions. DeepThink first generates a few seed questions to mimic actual user questions, simulates conversations to uncover the hidden user needs, and refines the answer by conversational contexts and the retrieved documents for more comprehensive answers. Experiments demonstrate that DeepThink achieves an average performance improvement of 7.92% compared to a GPT-4-turbo+RAG-based assistant on the real user test set in the advertising domain across dimensions such as relevance, completeness, clarity, accuracy, and actionability.
Related papers
- ScopeQA: A Framework for Generating Out-of-Scope Questions for RAG [52.33835101586687]
Conversational AI agents use Retrieval Augmented Generation (RAG) to provide verifiable document-grounded responses to user inquiries.
This paper presents a novel guided hallucination-based method to efficiently generate a diverse set of borderline out-of-scope confusing questions.
arXiv Detail & Related papers (2024-10-18T16:11:29Z) - CriSPO: Multi-Aspect Critique-Suggestion-guided Automatic Prompt Optimization for Text Generation [18.39379838806384]
We propose a novel critique-suggestion-guided automatic Prompt Optimization (CriSPO) approach.
CriSPO introduces a critique-suggestion module as its core component.
This module spontaneously discovers aspects, and compares generated reference texts across these aspects, providing actionable suggestions for prompt modification.
To further improve CriSPO with multi-metric optimization, we introduce an Automatic Suffix Tuning (AST) extension to enhance the performance of task prompts across multiple metrics.
arXiv Detail & Related papers (2024-10-03T17:57:01Z) - Learning When to Retrieve, What to Rewrite, and How to Respond in Conversational QA [16.1357049130957]
We build on the single-turn SELF-RAG framework and propose SELF-multi-RAG for conversational settings.
SELF-multi-RAG demonstrates improved capabilities over single-turn variants with respect to retrieving relevant passages.
arXiv Detail & Related papers (2024-09-23T20:05:12Z) - MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains [54.117238759317004]
Massive Multitask Agent Understanding (MMAU) benchmark features comprehensive offline tasks that eliminate the need for complex environment setups.
It evaluates models across five domains, including Tool-use, Directed Acyclic Graph (DAG) QA, Data Science and Machine Learning coding, Contest-level programming and Mathematics.
With a total of 20 meticulously designed tasks encompassing over 3K distinct prompts, MMAU provides a comprehensive framework for evaluating the strengths and limitations of LLM agents.
arXiv Detail & Related papers (2024-07-18T00:58:41Z) - Venn Diagram Prompting : Accelerating Comprehension with Scaffolding Effect [0.0]
We introduce Venn Diagram (VD) Prompting, an innovative prompting technique which allows Large Language Models (LLMs) to combine and synthesize information across documents.
Our proposed technique also aims to eliminate the inherent position bias in the LLMs, enhancing consistency in answers by removing sensitivity to the sequence of input information.
In the experiments performed on four public benchmark question-answering datasets, VD prompting continually matches or surpasses the performance of a meticulously crafted instruction prompt.
arXiv Detail & Related papers (2024-06-08T06:27:26Z) - Automated Evaluation of Retrieval-Augmented Language Models with Task-Specific Exam Generation [9.390902237835457]
We propose a new method to measure the task-specific accuracy of Retrieval-Augmented Large Language Models (RAG)
Evaluation is performed by scoring the RAG on an automatically-generated synthetic exam composed of multiple choice questions.
arXiv Detail & Related papers (2024-05-22T13:14:11Z) - Answer is All You Need: Instruction-following Text Embedding via
Answering the Question [41.727700155498546]
This paper offers a new viewpoint, which treats the instruction as a question about the input text and encodes the expected answers to obtain the representation accordingly.
Specifically, we propose InBedder that instantiates this embed-via-answering idea by only fine-tuning language models on abstractive question answering tasks.
arXiv Detail & Related papers (2024-02-15T01:02:41Z) - FreshLLMs: Refreshing Large Language Models with Search Engine
Augmentation [92.43001160060376]
We study the factuality of large language models (LLMs) in the context of answering questions that test current world knowledge.
We introduce FreshQA, a novel dynamic QA benchmark encompassing a diverse range of question and answer types.
We benchmark a diverse array of both closed and open-source LLMs under a two-mode evaluation procedure that allows us to measure both correctness and hallucination.
Motivated by these results, we present FreshPrompt, a simple few-shot prompting method that substantially boosts the performance of an LLM on FreshQA.
arXiv Detail & Related papers (2023-10-05T00:04:12Z) - ExpertPrompting: Instructing Large Language Models to be Distinguished
Experts [93.58012324415762]
ExpertPrompting elicits the potential of large language models to answer as distinguished experts.
We produce a new set of instruction-following data using GPT-3.5, and train a competitive open-source chat assistant called ExpertLLaMA.
arXiv Detail & Related papers (2023-05-24T03:51:31Z) - TEMPERA: Test-Time Prompting via Reinforcement Learning [57.48657629588436]
We propose Test-time Prompt Editing using Reinforcement learning (TEMPERA)
In contrast to prior prompt generation methods, TEMPERA can efficiently leverage prior knowledge.
Our method achieves 5.33x on average improvement in sample efficiency when compared to the traditional fine-tuning methods.
arXiv Detail & Related papers (2022-11-21T22:38:20Z) - Open-Retrieval Conversational Machine Reading [80.13988353794586]
In conversational machine reading, systems need to interpret natural language rules, answer high-level questions, and ask follow-up clarification questions.
Existing works assume the rule text is provided for each user question, which neglects the essential retrieval step in real scenarios.
In this work, we propose and investigate an open-retrieval setting of conversational machine reading.
arXiv Detail & Related papers (2021-02-17T08:55:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.