Language Modeling with Latent Situations
- URL: http://arxiv.org/abs/2212.10012v1
- Date: Tue, 20 Dec 2022 05:59:42 GMT
- Title: Language Modeling with Latent Situations
- Authors: Belinda Z. Li, Maxwell Nye, Jacob Andreas
- Abstract summary: SituationSupervision is a family of approaches for improving coherence in language models.
It trains models to construct and condition on explicit representations of entities and their states.
It produces major coherence improvements between 4-11%.
- Score: 46.38670628102201
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Language models (LMs) often generate incoherent outputs: they refer to events
and entity states that are incompatible with the state of the world described
in their inputs. We introduce SituationSupervision, a family of approaches for
improving coherence in LMs by training them to construct and condition on
explicit representations of entities and their states. SituationSupervision has
two components: an auxiliary situation modeling task that trains models to
predict state representations in context, and a latent state inference
procedure that imputes these states from partially annotated training data.
SituationSupervision can be applied to both fine-tuning (by supervising LMs to
encode state variables in their hidden representations) and prompting (by
inducing LMs to interleave textual descriptions of entity states with output
text). In both cases, SituationSupervision requires only a small number of
state annotations to produce major coherence improvements (between 4-11%),
showing that standard LMs can be sample-efficiently trained to model not just
language but the situations it describes.
Related papers
- Boosting the Capabilities of Compact Models in Low-Data Contexts with Large Language Models and Retrieval-Augmented Generation [2.9921619703037274]
We propose a retrieval augmented generation (RAG) framework backed by a large language model (LLM) to correct the output of a smaller model for the linguistic task of morphological glossing.
We leverage linguistic information to make up for the lack of data and trainable parameters, while allowing for inputs from written descriptive grammars interpreted and distilled through an LLM.
We show that a compact, RAG-supported model is highly effective in data-scarce settings, achieving a new state-of-the-art for this task and our target languages.
arXiv Detail & Related papers (2024-10-01T04:20:14Z) - Unlocking the Potential of Model Merging for Low-Resource Languages [66.7716891808697]
Adapting large language models to new languages typically involves continual pre-training (CT) followed by supervised fine-tuning (SFT)
We propose model merging as an alternative for low-resource languages, combining models with distinct capabilities into a single model without additional training.
Experiments based on Llama-2-7B demonstrate that model merging effectively endows LLMs for low-resource languages with task-solving abilities, outperforming CT-then-SFT in scenarios with extremely scarce data.
arXiv Detail & Related papers (2024-07-04T15:14:17Z) - LangSuitE: Planning, Controlling and Interacting with Large Language Models in Embodied Text Environments [70.91258869156353]
We introduce LangSuitE, a versatile and simulation-free testbed featuring 6 representative embodied tasks in textual embodied worlds.
Compared with previous LLM-based testbeds, LangSuitE offers adaptability to diverse environments without multiple simulation engines.
We devise a novel chain-of-thought (CoT) schema, EmMem, which summarizes embodied states w.r.t. history information.
arXiv Detail & Related papers (2024-06-24T03:36:29Z) - Context-Aware Machine Translation with Source Coreference Explanation [26.336947440529713]
We propose a model that explains the decisions made for translation by predicting coreference features in the input.
We evaluate our method in the WMT document-level translation task of English-German dataset, the English-Russian dataset, and the multilingual TED talk dataset.
arXiv Detail & Related papers (2024-04-30T12:41:00Z) - SCHEMA: State CHangEs MAtter for Procedure Planning in Instructional
Videos [54.01116513202433]
We study the problem of procedure planning in instructional videos, which aims to make a goal-oriented sequence of action steps given partial visual state observations.
Recent works succeeded in sequence modeling of steps with only sequence-level annotations accessible during training, which overlooked the roles of states in the procedures.
We aim to establish a more structured state space by investigating the causal relations between steps and states in procedures.
arXiv Detail & Related papers (2024-03-03T19:53:06Z) - Coalescing Global and Local Information for Procedural Text
Understanding [70.10291759879887]
A complete procedural understanding solution should combine three core aspects: local and global views of the inputs, and global view of outputs.
In this paper, we propose Coalescing Global and Local InformationCG, a new model that builds entity and time representations.
Experiments on a popular procedural text understanding dataset show that our model achieves state-of-the-art results.
arXiv Detail & Related papers (2022-08-26T19:16:32Z) - Generating Coherent Narratives by Learning Dynamic and Discrete Entity
States with a Contrastive Framework [68.1678127433077]
We extend the Transformer model to dynamically conduct entity state updates and sentence realization for narrative generation.
Experiments on two narrative datasets show that our model can generate more coherent and diverse narratives than strong baselines.
arXiv Detail & Related papers (2022-08-08T09:02:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.