Tripod: Three Complementary Inductive Biases for Disentangled Representation Learning
- URL: http://arxiv.org/abs/2404.10282v2
- Date: Fri, 24 May 2024 20:52:02 GMT
- Title: Tripod: Three Complementary Inductive Biases for Disentangled Representation Learning
- Authors: Kyle Hsu, Jubayer Ibn Hamid, Kaylee Burns, Chelsea Finn, Jiajun Wu,
- Abstract summary: In this work, we consider endowing a neural network autoencoder with three select inductive biases from the literature.
In practice, however, naively combining existing techniques instantiating these inductive biases fails to yield significant benefits.
We propose adaptations to the three techniques that simplify the learning problem, equip key regularization terms with stabilizing invariances, and quash degenerate incentives.
The resulting model, Tripod, achieves state-of-the-art results on a suite of four image disentanglement benchmarks.
- Score: 52.70210390424605
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Inductive biases are crucial in disentangled representation learning for narrowing down an underspecified solution set. In this work, we consider endowing a neural network autoencoder with three select inductive biases from the literature: data compression into a grid-like latent space via quantization, collective independence amongst latents, and minimal functional influence of any latent on how other latents determine data generation. In principle, these inductive biases are deeply complementary: they most directly specify properties of the latent space, encoder, and decoder, respectively. In practice, however, naively combining existing techniques instantiating these inductive biases fails to yield significant benefits. To address this, we propose adaptations to the three techniques that simplify the learning problem, equip key regularization terms with stabilizing invariances, and quash degenerate incentives. The resulting model, Tripod, achieves state-of-the-art results on a suite of four image disentanglement benchmarks. We also verify that Tripod significantly improves upon its naive incarnation and that all three of its "legs" are necessary for best performance.
Related papers
- Cocoon: Robust Multi-Modal Perception with Uncertainty-Aware Sensor Fusion [26.979291099052194]
We introduce Cocoon, an object- and feature-level uncertainty-aware fusion framework.
Key innovation lies in uncertainty quantification for heterogeneous representations.
Cocoon consistently outperforms existing static and adaptive methods in both normal and challenging conditions.
arXiv Detail & Related papers (2024-10-16T14:10:53Z) - Multi-threshold Deep Metric Learning for Facial Expression Recognition [60.26967776920412]
We present the multi-threshold deep metric learning technique, which avoids the difficult threshold validation.
We find that each threshold of the triplet loss intrinsically determines a distinctive distribution of inter-class variations.
It makes the embedding layer, which is composed of a set of slices, a more informative and discriminative feature.
arXiv Detail & Related papers (2024-06-24T08:27:31Z) - Steering Language Generation: Harnessing Contrastive Expert Guidance and
Negative Prompting for Coherent and Diverse Synthetic Data Generation [0.0]
Large Language Models (LLMs) hold immense potential to generate synthetic data of high quality and utility.
We introduce contrastive expert guidance, where the difference between the logit distributions of fine-tuned and base language models is emphasised.
We deem this dual-pronged approach to logit reshaping as STEER: Semantic Text Enhancement via Embedding Repositioning.
arXiv Detail & Related papers (2023-08-15T08:49:14Z) - Expressive Monotonic Neural Networks [1.0128808054306184]
The monotonic dependence of the outputs of a neural network on some of its inputs is a crucial inductive bias in many scenarios where domain knowledge dictates such behavior.
We propose a weight-constrained architecture with a single residual connection to achieve exact monotonic dependence in any subset of the inputs.
We show how the algorithm is used to train powerful, robust, and interpretable discriminators that achieve competitive performance.
arXiv Detail & Related papers (2023-07-14T17:59:53Z) - Disentanglement via Latent Quantization [60.37109712033694]
In this work, we construct an inductive bias towards encoding to and decoding from an organized latent space.
We demonstrate the broad applicability of this approach by adding it to both basic data-re (vanilla autoencoder) and latent-reconstructing (InfoGAN) generative models.
arXiv Detail & Related papers (2023-05-28T06:30:29Z) - Improving Neural Additive Models with Bayesian Principles [54.29602161803093]
Neural additive models (NAMs) enhance the transparency of deep neural networks by handling calibrated input features in separate additive sub-networks.
We develop Laplace-approximated NAMs (LA-NAMs) which show improved empirical performance on datasets and challenging real-world medical tasks.
arXiv Detail & Related papers (2023-05-26T13:19:15Z) - Understanding and Constructing Latent Modality Structures in Multi-modal
Representation Learning [53.68371566336254]
We argue that the key to better performance lies in meaningful latent modality structures instead of perfect modality alignment.
Specifically, we design 1) a deep feature separation loss for intra-modality regularization; 2) a Brownian-bridge loss for inter-modality regularization; and 3) a geometric consistency loss for both intra- and inter-modality regularization.
arXiv Detail & Related papers (2023-03-10T14:38:49Z) - On Modality Bias Recognition and Reduction [70.69194431713825]
We study the modality bias problem in the context of multi-modal classification.
We propose a plug-and-play loss function method, whereby the feature space for each label is adaptively learned.
Our method yields remarkable performance improvements compared with the baselines.
arXiv Detail & Related papers (2022-02-25T13:47:09Z) - Are Negative Samples Necessary in Entity Alignment? An Approach with
High Performance, Scalability and Robustness [26.04006507181558]
We propose a novel EA method with three new components to enable high Performance, high Scalability, and high Robustness.
We conduct detailed experiments on several public datasets to examine the effectiveness and efficiency of our proposed method.
arXiv Detail & Related papers (2021-08-11T15:20:41Z) - LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning [30.610670366488943]
We replace architecture engineering by encoding inductive bias in datasets.
Inspired by Peirce's view that deduction, induction, and abduction form an irreducible set of reasoning primitives, we design three synthetic tasks that are intended to require the model to have these three abilities.
Models trained with LIME significantly outperform vanilla transformers on three very different large mathematical reasoning benchmarks.
arXiv Detail & Related papers (2021-01-15T17:15:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.