BatGPT: A Bidirectional Autoregessive Talker from Generative Pre-trained
Transformer
- URL: http://arxiv.org/abs/2307.00360v2
- Date: Tue, 15 Aug 2023 13:59:42 GMT
- Title: BatGPT: A Bidirectional Autoregessive Talker from Generative Pre-trained
Transformer
- Authors: Zuchao Li, Shitou Zhang, Hai Zhao, Yifei Yang, Dongjie Yang
- Abstract summary: BatGPT is a large-scale language model designed and trained jointly by Wuhan University and Shanghai Jiao Tong University.
It is capable of generating highly natural and fluent text in response to various types of input, including text prompts, images, and audio.
- Score: 77.28871523946418
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: BatGPT is a large-scale language model designed and trained jointly by Wuhan
University and Shanghai Jiao Tong University. It is capable of generating
highly natural and fluent text in response to various types of input, including
text prompts, images, and audio. In the modeling level, we employ a
bidirectional autoregressive architecture that allows the model to efficiently
capture the complex dependencies of natural language, making it highly
effective in tasks such as language generation, dialog systems, and question
answering. Moreover, the bidirectional autoregressive modeling not only
operates from left to right but also from right to left, effectively reducing
fixed memory effects and alleviating model hallucinations.
In the training aspect, we propose a novel parameter expansion method for
leveraging the pre-training of smaller models and employ reinforcement learning
from both AI and human feedback, aimed at improving the model's alignment
performance. Overall, these approaches significantly improve the effectiveness
of BatGPT, and the model can be utilized for a wide range of natural language
applications.
Related papers
- SpeechGPT-Gen: Scaling Chain-of-Information Speech Generation [56.913182262166316]
Chain-of-Information Generation (CoIG) is a method for decoupling semantic and perceptual information in large-scale speech generation.
SpeechGPT-Gen is efficient in semantic and perceptual information modeling.
It markedly excels in zero-shot text-to-speech, zero-shot voice conversion, and speech-to-speech dialogue.
arXiv Detail & Related papers (2024-01-24T15:25:01Z) - Language-Guided World Models: A Model-Based Approach to AI Control [31.9089380929602]
This paper introduces the concept of Language-Guided World Models (LWMs)
LWMs are probabilistic models that can simulate environments by reading texts.
We take initial steps in developing robust LWMs that can generalize to compositionally novel language descriptions.
arXiv Detail & Related papers (2024-01-24T03:11:36Z) - Shattering the Agent-Environment Interface for Fine-Tuning Inclusive
Language Models [24.107358120517336]
In this work, we adopt a novel perspective wherein a pre-trained language model is itself simultaneously a policy, reward function, and transition function.
An immediate consequence of this is that reward learning and language model fine-tuning can be performed jointly and directly, without requiring any further downstream policy optimization.
arXiv Detail & Related papers (2023-05-19T06:21:15Z) - Bidirectional Language Models Are Also Few-shot Learners [54.37445173284831]
We present SAP (Sequential Autoregressive Prompting), a technique that enables the prompting of bidirectional models.
We show SAP is effective on question answering and summarization.
For the first time, our results demonstrate prompt-based learning is an emergent property of a broader class of language models.
arXiv Detail & Related papers (2022-09-29T01:35:57Z) - Few-shot Prompting Towards Controllable Response Generation [49.479958672988566]
We first explored the combination of prompting and reinforcement learning (RL) to steer models' generation without accessing any of the models' parameters.
We apply multi-task learning to make the model learn to generalize to new tasks better.
Experiment results show that our proposed method can successfully control several state-of-the-art (SOTA) dialogue models without accessing their parameters.
arXiv Detail & Related papers (2022-06-08T14:48:06Z) - Differentiable Prompt Makes Pre-trained Language Models Better Few-shot
Learners [23.150999852147283]
This study proposes a novel pluggable, and efficient approach named DifferentiAble pRompT (DART)
It can convert small language models into better few-shot learners without any prompt engineering.
A comprehensive evaluation of standard NLP tasks demonstrates that the proposed approach achieves a better few-shot performance.
arXiv Detail & Related papers (2021-08-30T12:29:25Z) - Read Like Humans: Autonomous, Bidirectional and Iterative Language
Modeling for Scene Text Recognition [80.446770909975]
Linguistic knowledge is of great benefit to scene text recognition.
How to effectively model linguistic rules in end-to-end deep networks remains a research challenge.
We propose an autonomous, bidirectional and iterative ABINet for scene text recognition.
arXiv Detail & Related papers (2021-03-11T06:47:45Z) - Optimus: Organizing Sentences via Pre-trained Modeling of a Latent Space [109.79957125584252]
Variational Autoencoder (VAE) can be both a powerful generative model and an effective representation learning framework for natural language.
In this paper, we propose the first large-scale language VAE model, Optimus.
arXiv Detail & Related papers (2020-04-05T06:20:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.