Language Models as Agent Models
- URL: http://arxiv.org/abs/2212.01681v1
- Date: Sat, 3 Dec 2022 20:18:16 GMT
- Title: Language Models as Agent Models
- Authors: Jacob Andreas
- Abstract summary: I argue that LMs are models of intentional communication in a specific, narrow sense.
Even in today's non-robust and error-prone models, LMs infer and use representations of fine-grained communicative intentions.
- Score: 42.37422271002712
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Language models (LMs) are trained on collections of documents, written by
individual human agents to achieve specific goals in an outside world. During
training, LMs have access only to text of these documents, with no direct
evidence of the internal states of the agents that produced them -- a fact
often used to argue that LMs are incapable of modeling goal-directed aspects of
human language production and comprehension. Can LMs trained on text learn
anything at all about the relationship between language and use? I argue that
LMs are models of intentional communication in a specific, narrow sense. When
performing next word prediction given a textual context, an LM can infer and
represent properties of an agent likely to have produced that context. These
representations can in turn influence subsequent LM generation in the same way
that agents' communicative intentions influence their language. I survey
findings from the recent literature showing that -- even in today's non-robust
and error-prone models -- LMs infer and use representations of fine-grained
communicative intentions and more abstract beliefs and goals. Despite the
limited nature of their training data, they can thus serve as building blocks
for systems that communicate and act intentionally.
Related papers
- From Babbling to Fluency: Evaluating the Evolution of Language Models in Terms of Human Language Acquisition [6.617999710257379]
We propose a three-stage framework to assess the abilities of LMs.
We evaluate the generative capacities of LMs using methods from linguistic research.
arXiv Detail & Related papers (2024-10-17T06:31:49Z) - Let Models Speak Ciphers: Multiagent Debate through Embeddings [84.20336971784495]
We introduce CIPHER (Communicative Inter-Model Protocol Through Embedding Representation) to address this issue.
By deviating from natural language, CIPHER offers an advantage of encoding a broader spectrum of information without any modification to the model weights.
This showcases the superiority and robustness of embeddings as an alternative "language" for communication among LLMs.
arXiv Detail & Related papers (2023-10-10T03:06:38Z) - Can LMs Learn New Entities from Descriptions? Challenges in Propagating
Injected Knowledge [72.63368052592004]
We study LMs' abilities to make inferences based on injected facts (or propagate those facts)
We find that existing methods for updating knowledge show little propagation of injected knowledge.
Yet, prepending entity definitions in an LM's context improves performance across all settings.
arXiv Detail & Related papers (2023-05-02T17:59:46Z) - Augmented Language Models: a Survey [55.965967655575454]
This survey reviews works in which language models (LMs) are augmented with reasoning skills and the ability to use tools.
We refer to them as Augmented Language Models (ALMs)
The missing token objective allows ALMs to learn to reason, use tools, and even act, while still performing standard natural language tasks.
arXiv Detail & Related papers (2023-02-15T18:25:52Z) - Language Models as Zero-Shot Planners: Extracting Actionable Knowledge
for Embodied Agents [111.33545170562337]
We investigate the possibility of grounding high-level tasks, expressed in natural language, to a chosen set of actionable steps.
We find that if pre-trained LMs are large enough and prompted appropriately, they can effectively decompose high-level tasks into low-level plans.
We propose a procedure that conditions on existing demonstrations and semantically translates the plans to admissible actions.
arXiv Detail & Related papers (2022-01-18T18:59:45Z) - Towards Language Modelling in the Speech Domain Using Sub-word
Linguistic Units [56.52704348773307]
We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes.
With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech.
We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
arXiv Detail & Related papers (2021-10-31T22:48:30Z) - Towards Continual Entity Learning in Language Models for Conversational
Agents [0.5330240017302621]
We introduce entity-aware language models (EALM), where we integrate entity models trained on catalogues of entities into pre-trained LMs.
Our combined language model adaptively adds information from the entity models into the pre-trained LM depending on the sentence context.
We show significant perplexity improvements on task-oriented dialogue datasets, especially on long-tailed utterances.
arXiv Detail & Related papers (2021-07-30T21:10:09Z) - Discourse structure interacts with reference but not syntax in neural
language models [17.995905582226463]
We study the ability of language models (LMs) to learn interactions between different linguistic representations.
We find that, contrary to humans, implicit causality only influences LM behavior for reference, not syntax.
Our results suggest that LM behavior can contradict not only learned representations of discourse but also syntactic agreement.
arXiv Detail & Related papers (2020-10-10T03:14:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.