disco: a toolkit for Distributional Control of Generative Models
- URL: http://arxiv.org/abs/2303.05431v1
- Date: Wed, 8 Mar 2023 18:58:52 GMT
- Title: disco: a toolkit for Distributional Control of Generative Models
- Authors: Germ\'an Kruszewski, Jos Rozen, Marc Dymetman
- Abstract summary: We present disco, an open-source Python library that brings distributional control techniques to the broader public.
Despite their potential, the widespread adoption of these techniques has been hindered by the difficulty in adapting complex, disconnected code.
- Score: 4.662591864499645
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-trained language models and other generative models have revolutionized
NLP and beyond. However, these models tend to reproduce undesirable biases
present in their training data. Also, they may overlook patterns that are
important but challenging to capture. To address these limitations, researchers
have introduced distributional control techniques. These techniques, not
limited to language, allow controlling the prevalence (i.e., expectations) of
any features of interest in the model's outputs. Despite their potential, the
widespread adoption of these techniques has been hindered by the difficulty in
adapting complex, disconnected code. Here, we present disco, an open-source
Python library that brings these techniques to the broader public.
Related papers
- Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines [74.42485647685272]
We focus on Generative Masked Language Models (GMLMs)
We train a model to fit conditional probabilities of the data distribution via masking, which are subsequently used as inputs to a Markov Chain to draw samples from the model.
We adapt the T5 model for iteratively-refined parallel decoding, achieving 2-3x speedup in machine translation with minimal sacrifice in quality.
arXiv Detail & Related papers (2024-07-22T18:00:00Z) - Heat Death of Generative Models in Closed-Loop Learning [63.83608300361159]
We study the learning dynamics of generative models that are fed back their own produced content in addition to their original training dataset.
We show that, unless a sufficient amount of external data is introduced at each iteration, any non-trivial temperature leads the model to degenerate.
arXiv Detail & Related papers (2024-04-02T21:51:39Z) - Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning.
Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation.
Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z) - Learning Defect Prediction from Unrealistic Data [57.53586547895278]
Pretrained models of code have become popular choices for code understanding and generation tasks.
Such models tend to be large and require commensurate volumes of training data.
It has become popular to train models with far larger but less realistic datasets, such as functions with artificially injected bugs.
Models trained on such data tend to only perform well on similar data, while underperforming on real world programs.
arXiv Detail & Related papers (2023-11-02T01:51:43Z) - Matching Pairs: Attributing Fine-Tuned Models to their Pre-Trained Large
Language Models [11.57282859281814]
We consider different knowledge levels and attribution strategies, and find that we can correctly trace back 8 out of the 10 fine tuned models with our best method.
arXiv Detail & Related papers (2023-06-15T17:42:48Z) - Learning from others' mistakes: Avoiding dataset biases without modeling
them [111.17078939377313]
State-of-the-art natural language processing (NLP) models often learn to model dataset biases and surface form correlations instead of features that target the intended task.
Previous work has demonstrated effective methods to circumvent these issues when knowledge of the bias is available.
We show a method for training models that learn to ignore these problematic correlations.
arXiv Detail & Related papers (2020-12-02T16:10:54Z) - A Simple and Interpretable Predictive Model for Healthcare [0.0]
Deep learning models are currently dominating most state-of-the-art solutions for disease prediction.
These deep learning models, with trainable parameters running into millions, require huge amounts of compute and data to train and deploy.
We develop a simpler yet interpretable non-deep learning based model for application to EHR data.
arXiv Detail & Related papers (2020-07-27T08:13:37Z) - Posterior Control of Blackbox Generation [126.33511630879713]
We consider augmenting neural generation models with discrete control states learned through a structured latent-variable approach.
We find that this method improves over standard benchmarks, while also providing fine-grained control.
arXiv Detail & Related papers (2020-05-10T03:22:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.