MeTHanol: Modularized Thinking Language Models with Intermediate Layer Thinking, Decoding and Bootstrapping Reasoning
- URL: http://arxiv.org/abs/2409.12059v3
- Date: Tue, 17 Dec 2024 16:30:39 GMT
- Title: MeTHanol: Modularized Thinking Language Models with Intermediate Layer Thinking, Decoding and Bootstrapping Reasoning
- Authors: Ningyuan Xi, Xiaoyu Wang, Yetao Wu, Teng Chen, Qingqing Gu, Yue Zhao, Jinxian Qu, Zhonglin Jiang, Yong Chen, Luo Ji,
- Abstract summary: Large Language Model can reasonably understand and generate human expressions but may lack thorough thinking and reasoning mechanisms.
In this paper, we are motivated by the cognitive mechanism in the natural world, and design a novel model architecture called TaS.
We train the language model by the thoughts-augmented data and successfully let the thinking layer automatically generate reasonable thoughts and finally output more reasonable responses.
- Score: 10.478620397712076
- License:
- Abstract: Large Language Model can reasonably understand and generate human expressions but may lack of thorough thinking and reasoning mechanisms. Recently there have been several studies which enhance the thinking ability of language models but most of them are not data-driven or training-based. In this paper, we are motivated by the cognitive mechanism in the natural world, and design a novel model architecture called TaS which allows it to first consider the thoughts and then express the response based upon the query. We design several pipelines to annotate or generate the thought contents from prompt-response samples, then add language heads in a middle layer which behaves as the thinking layer. We train the language model by the thoughts-augmented data and successfully let the thinking layer automatically generate reasonable thoughts and finally output more reasonable responses. Both qualitative examples and quantitative results validate the effectiveness and performance of TaS. Our code is available at https://anonymous.4open.science/r/TadE.
Related papers
- Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities [30.96613796974929]
We introduce a simple method to unlock the visual reasoning capabilities of multimodal large language models.
Whiteboard-of-thought prompting provides models with a metaphorical whiteboard' to draw out reasoning steps as images.
This simple approach shows state-of-the-art results on four difficult natural language tasks.
arXiv Detail & Related papers (2024-06-20T17:59:45Z) - What Makes Language Models Good-enough? [11.763229353978321]
Psycholinguistic research suggests that humans may build a representation of linguistic input that is 'good-enough' for the task at hand.
This study examines what architectural features make language models learn human-like good-enough language processing.
arXiv Detail & Related papers (2024-06-06T00:51:28Z) - Language Generation from Brain Recordings [68.97414452707103]
We propose a generative language BCI that utilizes the capacity of a large language model and a semantic brain decoder.
The proposed model can generate coherent language sequences aligned with the semantic content of visual or auditory language stimuli.
Our findings demonstrate the potential and feasibility of employing BCIs in direct language generation.
arXiv Detail & Related papers (2023-11-16T13:37:21Z) - BRAINTEASER: Lateral Thinking Puzzles for Large Language Models [15.95314613982879]
BRAINTEASER is a multiple-choice Question Answering task designed to test the model's ability to exhibit lateral thinking.
Our experiments with state-of-the-art instruction- and commonsense language models reveal a significant gap between human and model performance.
We make all of our code and data available to stimulate work on developing and evaluating lateral thinking models.
arXiv Detail & Related papers (2023-10-08T07:46:01Z) - From Word Models to World Models: Translating from Natural Language to
the Probabilistic Language of Thought [124.40905824051079]
We propose rational meaning construction, a computational framework for language-informed thinking.
We frame linguistic meaning as a context-sensitive mapping from natural language into a probabilistic language of thought.
We show that LLMs can generate context-sensitive translations that capture pragmatically-appropriate linguistic meanings.
We extend our framework to integrate cognitively-motivated symbolic modules.
arXiv Detail & Related papers (2023-06-22T05:14:00Z) - Minding Language Models' (Lack of) Theory of Mind: A Plug-and-Play
Multi-Character Belief Tracker [72.09076317574238]
ToM is a plug-and-play approach to investigate the belief states of characters in reading comprehension.
We show that ToM enhances off-the-shelf neural network theory mind in a zero-order setting while showing robust out-of-distribution performance compared to supervised baselines.
arXiv Detail & Related papers (2023-06-01T17:24:35Z) - Tree of Thoughts: Deliberate Problem Solving with Large Language Models [52.31950122881687]
We introduce a new framework for language model inference, Tree of Thoughts (ToT)
ToT generalizes over the popular Chain of Thought approach to prompting language models.
Our experiments show that ToT significantly enhances language models' problem-solving abilities.
arXiv Detail & Related papers (2023-05-17T23:16:17Z) - Chain of Thought Prompting Elicits Reasoning in Large Language Models [56.811278668446825]
This paper explores the ability of language models to generate a coherent chain of thought.
Experiments show that inducing a chain of thought via prompting can enable sufficiently large language models to better perform reasoning tasks.
arXiv Detail & Related papers (2022-01-28T02:33:07Z) - Read Like Humans: Autonomous, Bidirectional and Iterative Language
Modeling for Scene Text Recognition [80.446770909975]
Linguistic knowledge is of great benefit to scene text recognition.
How to effectively model linguistic rules in end-to-end deep networks remains a research challenge.
We propose an autonomous, bidirectional and iterative ABINet for scene text recognition.
arXiv Detail & Related papers (2021-03-11T06:47:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.