From LLM to Conversational Agent: A Memory Enhanced Architecture with
Fine-Tuning of Large Language Models
- URL: http://arxiv.org/abs/2401.02777v2
- Date: Tue, 30 Jan 2024 07:02:30 GMT
- Title: From LLM to Conversational Agent: A Memory Enhanced Architecture with
Fine-Tuning of Large Language Models
- Authors: Na Liu, Liangyu Chen, Xiaoyu Tian, Wei Zou, Kaijiang Chen, Ming Cui
- Abstract summary: RAISE (Reasoning and Acting through Scratchpad and Examples) is an advanced architecture enhancing the integration of Large Language Models (LLMs) into conversational agents.
It incorporates a dual-component memory system, mirroring human short-term and long-term memory, to maintain context and continuity in conversations.
- Score: 11.999652715036643
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper introduces RAISE (Reasoning and Acting through Scratchpad and
Examples), an advanced architecture enhancing the integration of Large Language
Models (LLMs) like GPT-4 into conversational agents. RAISE, an enhancement of
the ReAct framework, incorporates a dual-component memory system, mirroring
human short-term and long-term memory, to maintain context and continuity in
conversations. It entails a comprehensive agent construction scenario,
including phases like Conversation Selection, Scene Extraction, CoT Completion,
and Scene Augmentation, leading to the LLMs Training phase. This approach
appears to enhance agent controllability and adaptability in complex,
multi-turn dialogues. Our preliminary evaluations in a real estate sales
context suggest that RAISE has some advantages over traditional agents,
indicating its potential for broader applications. This work contributes to the
AI field by providing a robust framework for developing more context-aware and
versatile conversational agents.
Related papers
- Hello Again! LLM-powered Personalized Agent for Long-term Dialogue [63.65128176360345]
We introduce a model-agnostic framework, the Long-term Dialogue Agent (LD-Agent)
It incorporates three independently tunable modules dedicated to event perception, persona extraction, and response generation.
The effectiveness, generality, and cross-domain capabilities of LD-Agent are empirically demonstrated.
arXiv Detail & Related papers (2024-06-09T21:58:32Z) - HELPER-X: A Unified Instructable Embodied Agent to Tackle Four Interactive Vision-Language Domains with Memory-Augmented Language Models [13.963676467274109]
We extend the capabilities of HELPER by expanding its memory with a wider array of examples and prompts.
This simple expansion of HELPER into a shared memory enables the agent to work across domains executing plans from dialogue, natural language instruction, active question asking, and common room reorganization.
We evaluate the agent on four diverse interactive visual-language embodied agent: AChRED, TEA, DialFRED, and the Tidy Task.
arXiv Detail & Related papers (2024-04-29T19:12:42Z) - DFA-RAG: Conversational Semantic Router for Large Language Model with Definite Finite Automaton [44.26173742405563]
This paper introduces the retrieval-augmented large language model with Definite Finite Automaton (DFA-RAG)
DFA-RAG is a framework designed to enhance the capabilities of conversational agents using large language models (LLMs)
arXiv Detail & Related papers (2024-02-06T21:14:45Z) - DialCLIP: Empowering CLIP as Multi-Modal Dialog Retriever [83.33209603041013]
We propose a parameter-efficient prompt-tuning method named DialCLIP for multi-modal dialog retrieval.
Our approach introduces a multi-modal context generator to learn context features which are distilled into prompts within the pre-trained vision-language model CLIP.
To facilitate various types of retrieval, we also design multiple experts to learn mappings from CLIP outputs to multi-modal representation space.
arXiv Detail & Related papers (2024-01-02T07:40:12Z) - Towards More Unified In-context Visual Understanding [74.55332581979292]
We present a new ICL framework for visual understanding with multi-modal output enabled.
First, we quantize and embed both text and visual prompt into a unified representational space.
Then a decoder-only sparse transformer architecture is employed to perform generative modeling on them.
arXiv Detail & Related papers (2023-12-05T06:02:21Z) - Self-Explanation Prompting Improves Dialogue Understanding in Large
Language Models [52.24756457516834]
We propose a novel "Self-Explanation" prompting strategy to enhance the comprehension abilities of Large Language Models (LLMs)
This task-agnostic approach requires the model to analyze each dialogue utterance before task execution, thereby improving performance across various dialogue-centric tasks.
Experimental results from six benchmark datasets confirm that our method consistently outperforms other zero-shot prompts and matches or exceeds the efficacy of few-shot prompts.
arXiv Detail & Related papers (2023-09-22T15:41:34Z) - A Mixture-of-Expert Approach to RL-based Dialogue Management [56.08449336469477]
We use reinforcement learning to develop a dialogue agent that avoids being short-sighted (outputting generic utterances) and maximizes overall user satisfaction.
Most existing RL approaches to DM train the agent at the word-level, and thus, have to deal with aly complex action space even for a medium-size vocabulary.
We develop a RL-based DM using a novel mixture of expert language model (MoE-LM) that consists of (i) a LM capable of learning diverse semantics for conversation histories, (ii) a number of specialized LMs (or experts) capable of generating utterances corresponding to a
arXiv Detail & Related papers (2022-05-31T19:00:41Z) - Video-Grounded Dialogues with Pretrained Generation Language Models [88.15419265622748]
We leverage the power of pre-trained language models for improving video-grounded dialogue.
We propose a framework by formulating sequence-to-grounded dialogue tasks as a sequence-to-grounded task.
Our framework allows fine-tuning language models to capture dependencies across multiple modalities.
arXiv Detail & Related papers (2020-06-27T08:24:26Z) - Exploring Recurrent, Memory and Attention Based Architectures for
Scoring Interactional Aspects of Human-Machine Text Dialog [9.209192502526285]
This paper builds on previous work in this direction to investigate multiple neural architectures.
We conduct experiments on a conversational database of text dialogs from human learners interacting with a cloud-based dialog system.
We find that fusion of multiple architectures performs competently on our automated scoring task relative to expert inter-rater agreements.
arXiv Detail & Related papers (2020-05-20T03:23:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.