RevUp: Revise and Update Information Bottleneck for Event Representation
- URL: http://arxiv.org/abs/2205.12248v1
- Date: Tue, 24 May 2022 17:54:59 GMT
- Title: RevUp: Revise and Update Information Bottleneck for Event Representation
- Authors: Mehdi Rezaee and Francis Ferraro
- Abstract summary: In machine learning, latent variables play a key role to capture the underlying structure of data, but they are often unsupervised.
We propose a semi-supervised information bottleneck-based model that enables the use of side knowledge to direct the learning of discrete latent variables.
We show that our approach generalizes an existing method of parameter injection, and perform an empirical case study of our approach on language-based event modeling.
- Score: 16.54912614895861
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In machine learning, latent variables play a key role to capture the
underlying structure of data, but they are often unsupervised. When we have
side knowledge that already has high-level information about the input data, we
can use that source to guide latent variables and capture the available
background information in a process called "parameter injection." In that
regard, we propose a semi-supervised information bottleneck-based model that
enables the use of side knowledge, even if it is noisy and imperfect, to direct
the learning of discrete latent variables. Fundamentally, we introduce an
auxiliary continuous latent variable as a way to reparameterize the model's
discrete variables with a light-weight hierarchical structure. With this
reparameterization, the model's discrete latent variables are learned to
minimize the mutual information between the observed data and optional side
knowledge that is not already captured by the new, auxiliary variables. We
theoretically show that our approach generalizes an existing method of
parameter injection, and perform an empirical case study of our approach on
language-based event modeling. We corroborate our theoretical results with
strong empirical experiments, showing that the proposed method outperforms
previous proposed approaches on multiple datasets.
Related papers
- Information theory for data-driven model reduction in physics and biology [0.0]
We develop a systematic approach based on the information bottleneck to identify the relevant variables.
We show that in the limit of high compression, the relevant variables are directly determined by the slowest-decaying eigenfunctions.
It provides a firm foundation to construct interpretable deep learning tools that perform model reduction.
arXiv Detail & Related papers (2023-12-11T18:39:05Z) - Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective [106.92016199403042]
We empirically investigate knowledge transfer from larger to smaller models through a parametric perspective.
We employ sensitivity-based techniques to extract and align knowledge-specific parameters between different large language models.
Our findings highlight the critical factors contributing to the process of parametric knowledge transfer.
arXiv Detail & Related papers (2023-10-17T17:58:34Z) - Latent Variable Representation for Reinforcement Learning [131.03944557979725]
It remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of model-based reinforcement learning.
We provide a representation view of the latent variable models for state-action value functions, which allows both tractable variational learning algorithm and effective implementation of the optimism/pessimism principle.
In particular, we propose a computationally efficient planning algorithm with UCB exploration by incorporating kernel embeddings of latent variable models.
arXiv Detail & Related papers (2022-12-17T00:26:31Z) - FUNCK: Information Funnels and Bottlenecks for Invariant Representation
Learning [7.804994311050265]
We investigate a set of related information funnels and bottleneck problems that claim to learn invariant representations from the data.
We propose a new element to this family of information-theoretic objectives: The Conditional Privacy Funnel with Side Information.
Given the generally intractable objectives, we derive tractable approximations using amortized variational inference parameterized by neural networks.
arXiv Detail & Related papers (2022-11-02T19:37:55Z) - Dynamic Latent Separation for Deep Learning [67.62190501599176]
A core problem in machine learning is to learn expressive latent variables for model prediction on complex data.
Here, we develop an approach that improves expressiveness, provides partial interpretation, and is not restricted to specific applications.
arXiv Detail & Related papers (2022-10-07T17:56:53Z) - Learning Conditional Invariance through Cycle Consistency [60.85059977904014]
We propose a novel approach to identify meaningful and independent factors of variation in a dataset.
Our method involves two separate latent subspaces for the target property and the remaining input information.
We demonstrate on synthetic and molecular data that our approach identifies more meaningful factors which lead to sparser and more interpretable models.
arXiv Detail & Related papers (2021-11-25T17:33:12Z) - InteL-VAEs: Adding Inductive Biases to Variational Auto-Encoders via
Intermediary Latents [60.785317191131284]
We introduce a simple and effective method for learning VAEs with controllable biases by using an intermediary set of latent variables.
In particular, it allows us to impose desired properties like sparsity or clustering on learned representations.
We show that this, in turn, allows InteL-VAEs to learn both better generative models and representations.
arXiv Detail & Related papers (2021-06-25T16:34:05Z) - MINIMALIST: Mutual INformatIon Maximization for Amortized Likelihood
Inference from Sampled Trajectories [61.3299263929289]
Simulation-based inference enables learning the parameters of a model even when its likelihood cannot be computed in practice.
One class of methods uses data simulated with different parameters to infer an amortized estimator for the likelihood-to-evidence ratio.
We show that this approach can be formulated in terms of mutual information between model parameters and simulated data.
arXiv Detail & Related papers (2021-06-03T12:59:16Z) - Triplot: model agnostic measures and visualisations for variable
importance in predictive models that take into account the hierarchical
correlation structure [3.0036519884678894]
We propose new methods to support model analysis by exploiting the information about the correlation between variables.
We show how to analyze groups of variables (aspects) both when they are proposed by the user and when they should be determined automatically.
We also present the new type of model visualisation, triplot, which exploits a hierarchical structure of variable grouping to produce a high information density model visualisation.
arXiv Detail & Related papers (2021-04-07T21:29:03Z) - Variational Mutual Information Maximization Framework for VAE Latent
Codes with Continuous and Discrete Priors [5.317548969642376]
Variational Autoencoder (VAE) is a scalable method for learning directed latent variable models of complex data.
We propose Variational Mutual Information Maximization Framework for VAE to address this issue.
arXiv Detail & Related papers (2020-06-02T09:05:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.