Dual-Layer Training and Decoding of Large Language Model with Simultaneously Thinking and Speaking
- URL: http://arxiv.org/abs/2409.12059v2
- Date: Fri, 27 Sep 2024 13:07:26 GMT
- Title: Dual-Layer Training and Decoding of Large Language Model with Simultaneously Thinking and Speaking
- Authors: Ningyuan Xi, Xiaoyu Wang, Yetao Wu, Teng Chen, Qingqing Gu, Jinxian Qu, Zhonglin Jiang, Yong Chen, Luo Ji,
- Abstract summary: Large Language Model can reasonably understand and generate human expressions but may lack thorough thinking and reasoning mechanisms.
In this paper, we are motivated by the cognitive mechanism in the natural world, and design a novel model architecture called TaS.
We train the language model by the thoughts-augmented data and successfully let the thinking layer automatically generate reasonable thoughts and finally output more reasonable responses.
- Score: 8.02728252625147
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Large Language Model can reasonably understand and generate human expressions but may lack of thorough thinking and reasoning mechanisms. Recently there have been several studies which enhance the thinking ability of language models but most of them are not data-driven or training-based. In this paper, we are motivated by the cognitive mechanism in the natural world, and design a novel model architecture called TaS which allows it to first consider the thoughts and then express the response based upon the query. We design several pipelines to annotate or generate the thought contents from prompt-response samples, then add language heads in a middle layer which behaves as the thinking layer. We train the language model by the thoughts-augmented data and successfully let the thinking layer automatically generate reasonable thoughts and finally output more reasonable responses. Both qualitative examples and quantitative results validate the effectiveness and performance of TaS. Our code is available at https://anonymous.4open.science/r/TadE.
Related papers
- Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities [30.96613796974929]
We introduce a simple method to unlock the visual reasoning capabilities of multimodal large language models.
Whiteboard-of-thought prompting provides models with a metaphorical whiteboard' to draw out reasoning steps as images.
This simple approach shows state-of-the-art results on four difficult natural language tasks.
arXiv Detail & Related papers (2024-06-20T17:59:45Z) - What Makes Language Models Good-enough? [11.763229353978321]
Psycholinguistic research suggests that humans may build a representation of linguistic input that is 'good-enough' for the task at hand.
This study examines what architectural features make language models learn human-like good-enough language processing.
arXiv Detail & Related papers (2024-06-06T00:51:28Z) - Language Generation from Brain Recordings [68.97414452707103]
We propose a generative language BCI that utilizes the capacity of a large language model and a semantic brain decoder.
The proposed model can generate coherent language sequences aligned with the semantic content of visual or auditory language stimuli.
Our findings demonstrate the potential and feasibility of employing BCIs in direct language generation.
arXiv Detail & Related papers (2023-11-16T13:37:21Z) - From Word Models to World Models: Translating from Natural Language to
the Probabilistic Language of Thought [124.40905824051079]
We propose rational meaning construction, a computational framework for language-informed thinking.
We frame linguistic meaning as a context-sensitive mapping from natural language into a probabilistic language of thought.
We show that LLMs can generate context-sensitive translations that capture pragmatically-appropriate linguistic meanings.
We extend our framework to integrate cognitively-motivated symbolic modules.
arXiv Detail & Related papers (2023-06-22T05:14:00Z) - Minding Language Models' (Lack of) Theory of Mind: A Plug-and-Play
Multi-Character Belief Tracker [72.09076317574238]
ToM is a plug-and-play approach to investigate the belief states of characters in reading comprehension.
We show that ToM enhances off-the-shelf neural network theory mind in a zero-order setting while showing robust out-of-distribution performance compared to supervised baselines.
arXiv Detail & Related papers (2023-06-01T17:24:35Z) - Language Models are Bounded Pragmatic Speakers: Understanding RLHF from
a Bayesian Cognitive Modeling Perspective [2.8282906214258805]
This paper formulates a probabilistic cognitive model called the bounded pragmatic speaker.
We demonstrate that large language models fine-tuned with reinforcement learning from human feedback embody a model of thought that resembles a fast-and-slow model.
arXiv Detail & Related papers (2023-05-28T16:04:48Z) - Chain of Thought Prompting Elicits Reasoning in Large Language Models [56.811278668446825]
This paper explores the ability of language models to generate a coherent chain of thought.
Experiments show that inducing a chain of thought via prompting can enable sufficiently large language models to better perform reasoning tasks.
arXiv Detail & Related papers (2022-01-28T02:33:07Z) - Read Like Humans: Autonomous, Bidirectional and Iterative Language
Modeling for Scene Text Recognition [80.446770909975]
Linguistic knowledge is of great benefit to scene text recognition.
How to effectively model linguistic rules in end-to-end deep networks remains a research challenge.
We propose an autonomous, bidirectional and iterative ABINet for scene text recognition.
arXiv Detail & Related papers (2021-03-11T06:47:45Z) - Multilingual Answer Sentence Reranking via Automatically Translated Data [97.98885151955467]
We present a study on the design of multilingual Answer Sentence Selection (AS2) models, which are a core component of modern Question Answering (QA) systems.
The main idea is to transfer data, created from one resource rich language, e.g., English, to other languages, less rich in terms of resources.
arXiv Detail & Related papers (2021-02-20T03:52:08Z) - Learning Spoken Language Representations with Neural Lattice Language
Modeling [39.50831917042577]
We propose a framework that trains neural lattice language models to provide contextualized representations for spoken language understanding tasks.
The proposed two-stage pre-training approach reduces the demands of speech data and has better efficiency.
arXiv Detail & Related papers (2020-07-06T10:38:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.