Alignment Studio: Aligning Large Language Models to Particular Contextual Regulations
- URL: http://arxiv.org/abs/2403.09704v1
- Date: Fri, 8 Mar 2024 21:26:49 GMT
- Title: Alignment Studio: Aligning Large Language Models to Particular Contextual Regulations
- Authors: Swapnaja Achintalwar, Ioana Baldini, Djallel Bouneffouf, Joan Byamugisha, Maria Chang, Pierre Dognin, Eitan Farchi, Ndivhuwo Makondo, Aleksandra Mojsilovic, Manish Nagireddy, Karthikeyan Natesan Ramamurthy, Inkit Padhi, Orna Raz, Jesus Rios, Prasanna Sattigeri, Moninder Singh, Siphiwe Thwala, Rosario A. Uceda-Sosa, Kush R. Varshney,
- Abstract summary: We present an approach that empowers application developers to tune a model to their particular values, social norms, laws and other regulations.
We lay out three main components of such an Alignment Studio architecture: Framers, Instructors, and Auditors.
- Score: 61.141986747544024
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The alignment of large language models is usually done by model providers to add or control behaviors that are common or universally understood across use cases and contexts. In contrast, in this article, we present an approach and architecture that empowers application developers to tune a model to their particular values, social norms, laws and other regulations, and orchestrate between potentially conflicting requirements in context. We lay out three main components of such an Alignment Studio architecture: Framers, Instructors, and Auditors that work in concert to control the behavior of a language model. We illustrate this approach with a running example of aligning a company's internal-facing enterprise chatbot to its business conduct guidelines.
Related papers
- Collapsed Language Models Promote Fairness [88.48232731113306]
We find that debiased language models exhibit collapsed alignment between token representations and word embeddings.
We design a principled fine-tuning method that can effectively improve fairness in a wide range of debiasing methods.
arXiv Detail & Related papers (2024-10-06T13:09:48Z) - The Problem of Alignment [1.2277343096128712]
Large Language Models produce sequences learned as statistical patterns from large corpora.
After initial training models must be aligned with human values, prefer certain continuations over others.
We examine this practice of structuration as a two-way interaction between users and models.
arXiv Detail & Related papers (2023-12-30T11:44:59Z) - Unifying Corroborative and Contributive Attributions in Large Language
Models [11.376900121298517]
We argue for and present a unified framework of large language model attributions.
We show how existing methods of different types of attribution fall under the unified framework.
We believe that this unified framework will guide the use case driven development of systems that leverage both types of attribution, as well as the standardization of their evaluation.
arXiv Detail & Related papers (2023-11-20T23:17:20Z) - Foundational Models Defining a New Era in Vision: A Survey and Outlook [151.49434496615427]
Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world.
The models learned to bridge the gap between such modalities coupled with large-scale training data facilitate contextual reasoning, generalization, and prompt capabilities at test time.
The output of such models can be modified through human-provided prompts without retraining, e.g., segmenting a particular object by providing a bounding box, having interactive dialogues by asking questions about an image or video scene or manipulating the robot's behavior through language instructions.
arXiv Detail & Related papers (2023-07-25T17:59:18Z) - Controllable Text Generation with Language Constraints [39.741059642044874]
We consider the task of text generation in language models with constraints specified in natural language.
Our benchmark contains knowledge-intensive constraints sourced from databases like Wordnet and Wikidata.
We propose a solution to leverage a language model's own internal knowledge to guide generation.
arXiv Detail & Related papers (2022-12-20T17:39:21Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - The Whole Truth and Nothing But the Truth: Faithful and Controllable
Dialogue Response Generation with Dataflow Transduction and Constrained
Decoding [65.34601470417967]
We describe a hybrid architecture for dialogue response generation that combines the strengths of neural language modeling and rule-based generation.
Our experiments show that this system outperforms both rule-based and learned approaches in human evaluations of fluency, relevance, and truthfulness.
arXiv Detail & Related papers (2022-09-16T09:00:49Z) - Language Models are General-Purpose Interfaces [109.45478241369655]
We propose to use language models as a general-purpose interface to various foundation models.
A collection of pretrained encoders perceive diverse modalities (such as vision, and language)
We propose a semi-causal language modeling objective to jointly pretrain the interface and the modular encoders.
arXiv Detail & Related papers (2022-06-13T17:34:22Z) - Customizing Contextualized Language Models forLegal Document Reviews [0.22940141855172028]
We show how different language models strained on general-domain corpora can be best customized for legal document reviewing tasks.
We compare their efficiencies with respect to task performances and present practical considerations.
arXiv Detail & Related papers (2021-02-10T22:14:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.