Discrete Key-Value Bottleneck
- URL: http://arxiv.org/abs/2207.11240v3
- Date: Mon, 12 Jun 2023 15:30:22 GMT
- Title: Discrete Key-Value Bottleneck
- Authors: Frederik Tr\"auble, Anirudh Goyal, Nasim Rahaman, Michael Mozer, Kenji
Kawaguchi, Yoshua Bengio, Bernhard Sch\"olkopf
- Abstract summary: Deep neural networks perform well on classification tasks where data streams are i.i.d. and labeled data is abundant.
One powerful approach that has addressed this challenge involves pre-training of large encoders on volumes of readily available data, followed by task-specific tuning.
Given a new task, however, updating the weights of these encoders is challenging as a large number of weights needs to be fine-tuned, and as a result, they forget information about the previous tasks.
We propose a model architecture to address this issue, building upon a discrete bottleneck containing pairs of separate and learnable key-value codes.
- Score: 95.61236311369821
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks perform well on classification tasks where data streams
are i.i.d. and labeled data is abundant. Challenges emerge with non-stationary
training data streams such as continual learning. One powerful approach that
has addressed this challenge involves pre-training of large encoders on volumes
of readily available data, followed by task-specific tuning. Given a new task,
however, updating the weights of these encoders is challenging as a large
number of weights needs to be fine-tuned, and as a result, they forget
information about the previous tasks. In the present work, we propose a model
architecture to address this issue, building upon a discrete bottleneck
containing pairs of separate and learnable key-value codes. Our paradigm will
be to encode; process the representation via a discrete bottleneck; and decode.
Here, the input is fed to the pre-trained encoder, the output of the encoder is
used to select the nearest keys, and the corresponding values are fed to the
decoder to solve the current task. The model can only fetch and re-use a sparse
number of these key-value pairs during inference, enabling localized and
context-dependent model updates. We theoretically investigate the ability of
the discrete key-value bottleneck to minimize the effect of learning under
distribution shifts and show that it reduces the complexity of the hypothesis
class. We empirically verify the proposed method under challenging
class-incremental learning scenarios and show that the proposed model - without
any task boundaries - reduces catastrophic forgetting across a wide variety of
pre-trained models, outperforming relevant baselines on this task.
Related papers
- A Fresh Take on Stale Embeddings: Improving Dense Retriever Training with Corrector Networks [81.2624272756733]
In dense retrieval, deep encoders provide embeddings for both inputs and targets.
We train a small parametric corrector network that adjusts stale cached target embeddings.
Our approach matches state-of-the-art results even when no target embedding updates are made during training.
arXiv Detail & Related papers (2024-09-03T13:29:13Z) - Complementary Learning Subnetworks for Parameter-Efficient
Class-Incremental Learning [40.13416912075668]
We propose a rehearsal-free CIL approach that learns continually via the synergy between two Complementary Learning Subnetworks.
Our method achieves competitive results against state-of-the-art methods, especially in accuracy gain, memory cost, training efficiency, and task-order.
arXiv Detail & Related papers (2023-06-21T01:43:25Z) - Enhancing Multiple Reliability Measures via Nuisance-extended
Information Bottleneck [77.37409441129995]
In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition.
We consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training.
We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training.
arXiv Detail & Related papers (2023-03-24T16:03:21Z) - BatchFormer: Learning to Explore Sample Relationships for Robust
Representation Learning [93.38239238988719]
We propose to enable deep neural networks with the ability to learn the sample relationships from each mini-batch.
BatchFormer is applied into the batch dimension of each mini-batch to implicitly explore sample relationships during training.
We perform extensive experiments on over ten datasets and the proposed method achieves significant improvements on different data scarcity applications.
arXiv Detail & Related papers (2022-03-03T05:31:33Z) - Lifelong Learning Without a Task Oracle [13.331659934508764]
Supervised deep neural networks are known to undergo a sharp decline in the accuracy of older tasks when new tasks are learned.
We propose and compare several candidate task-assigning mappers which require very little memory overhead.
Best-performing variants only impose an average cost of 1.7% parameter memory increase.
arXiv Detail & Related papers (2020-11-09T21:30:31Z) - Learning to Count in the Crowd from Limited Labeled Data [109.2954525909007]
We focus on reducing the annotation efforts by learning to count in the crowd from limited number of labeled samples.
Specifically, we propose a Gaussian Process-based iterative learning mechanism that involves estimation of pseudo-ground truth for the unlabeled data.
arXiv Detail & Related papers (2020-07-07T04:17:01Z) - Laplacian Denoising Autoencoder [114.21219514831343]
We propose to learn data representations with a novel type of denoising autoencoder.
The noisy input data is generated by corrupting latent clean data in the gradient domain.
Experiments on several visual benchmarks demonstrate that better representations can be learned with the proposed approach.
arXiv Detail & Related papers (2020-03-30T16:52:39Z) - Conditional Mutual information-based Contrastive Loss for Financial Time
Series Forecasting [12.0855096102517]
We present a representation learning framework for financial time series forecasting.
In this paper, we propose to first learn compact representations from time series data, then use the learned representations to train a simpler model for predicting time series movements.
arXiv Detail & Related papers (2020-02-18T15:24:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.