Related papers: Matryoshka Representation Learning

Matryoshka Representation Learning

URL: http://arxiv.org/abs/2205.13147v4
Date: Thu, 8 Feb 2024 03:21:26 GMT
Title: Matryoshka Representation Learning
Authors: Aditya Kusupati, Gantavya Bhatt, Aniket Rege, Matthew Wallingford, Aditya Sinha, Vivek Ramanujan, William Howard-Snyder, Kaifeng Chen, Sham Kakade, Prateek Jain, Ali Farhadi
Abstract summary: Matryoshka Representation Learning allows a single embedding to adapt to the computational constraints of downstream tasks. MRL learns coarse-to-fine representations that are at least as accurate and rich as independently trained low-dimensional representations. MRL extends seamlessly to web-scale datasets -- vision (ViT, ResNet), vision + language (ALIGN) and language (BERT)
Score: 43.62026091806627
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Learned representations are a central component in modern ML systems, serving a multitude of downstream tasks. When training such representations, it is often the case that computational and statistical constraints for each downstream task are unknown. In this context rigid, fixed capacity representations can be either over or under-accommodating to the task at hand. This leads us to ask: can we design a flexible representation that can adapt to multiple downstream tasks with varying computational resources? Our main contribution is Matryoshka Representation Learning (MRL) which encodes information at different granularities and allows a single embedding to adapt to the computational constraints of downstream tasks. MRL minimally modifies existing representation learning pipelines and imposes no additional cost during inference and deployment. MRL learns coarse-to-fine representations that are at least as accurate and rich as independently trained low-dimensional representations. The flexibility within the learned Matryoshka Representations offer: (a) up to 14x smaller embedding size for ImageNet-1K classification at the same level of accuracy; (b) up to 14x real-world speed-ups for large-scale retrieval on ImageNet-1K and 4K; and (c) up to 2% accuracy improvements for long-tail few-shot classification, all while being as robust as the original representations. Finally, we show that MRL extends seamlessly to web-scale datasets (ImageNet, JFT) across various modalities -- vision (ViT, ResNet), vision + language (ALIGN) and language (BERT). MRL code and pretrained models are open-sourced at https://github.com/RAIVNLab/MRL.

Related papers

MMRL: Multi-Modal Representation Learning for Vision-Language Models [4.828668077793944]
Multi-Modal Representation Learning (MMRL) is a framework that introduces a shared, learnable, and modality-agnostic representation space. MMRL projects the space tokens to text and image representation tokens, facilitating more effective multi-modal interactions. Experiments across 15 datasets demonstrate that MMRL outperforms state-of-the-art methods.
arXiv Detail & Related papers (2025-03-11T14:48:01Z)
Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation [42.590255022001145]
Matryoshka Representation Learning (MRL) recently emerged as a solution for adaptive embedding lengths. We show that sparse coding offers a compelling alternative for achieving adaptive representation with minimal overhead and higher fidelity.
arXiv Detail & Related papers (2025-03-03T17:59:48Z)
Efficient Multimodal Learning from Data-centric Perspective [21.35857180519653]
We introduce Bunny, a family of lightweight MLLMs with flexible vision and language backbones for efficient multimodal learning. Experiments show that our Bunny-4B/8B outperforms the state-of-the-art large MLLMs on multiple benchmarks.
arXiv Detail & Related papers (2024-02-18T10:09:10Z)
Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs [49.88461345825586]
This paper proposes a new framework to enhance the fine-grained image understanding abilities of MLLMs. We present a new method for constructing the instruction tuning dataset at a low cost by leveraging annotations in existing datasets. We show that our model exhibits a 5.2% accuracy improvement over Qwen-VL and surpasses the accuracy of Kosmos-2 by 24.7%.
arXiv Detail & Related papers (2023-10-01T05:53:15Z)
MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments [72.6405488990753]
Self-supervised learning can be used for mitigating the greedy needs of Vision Transformer networks. We propose a single-stage and standalone method, MOCA, which unifies both desired properties. We achieve new state-of-the-art results on low-shot settings and strong experimental results in various evaluation protocols.
arXiv Detail & Related papers (2023-07-18T15:46:20Z)
Diffused Redundancy in Pre-trained Representations [98.55546694886819]
We take a closer look at how features are encoded in pre-trained representations. We find that learned representations in a given layer exhibit a degree of diffuse redundancy. Our findings shed light on the nature of representations learned by pre-trained deep neural networks.
arXiv Detail & Related papers (2023-05-31T21:00:50Z)
Reinforcement Learning Friendly Vision-Language Model for Minecraft [31.863271032186038]
We propose a novel cross-modal contrastive learning framework architecture, CLIP4MC. We aim to learn a reinforcement learning (RL) friendly vision-language model (VLM) that serves as an intrinsic reward function for open-ended tasks. We demonstrate that the proposed method achieves better performance on RL tasks compared with baselines.
arXiv Detail & Related papers (2023-03-19T05:20:52Z)
Provable Benefit of Multitask Representation Learning in Reinforcement Learning [46.11628795660159]
This paper theoretically characterizes the benefit of representation learning under the low-rank Markov decision process (MDP) model. To the best of our knowledge, this is the first theoretical study that characterizes the benefit of representation learning in exploration-based reward-free multitask reinforcement learning.
arXiv Detail & Related papers (2022-06-13T04:29:02Z)
X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation [71.51719469058666]
We propose a representation learning framework called X-Learner. X-Learner learns the universal feature of multiple vision tasks supervised by various sources. X-Learner achieves strong performance on different tasks without extra annotations, modalities and computational costs.
arXiv Detail & Related papers (2022-03-16T17:23:26Z)
Provable and Efficient Continual Representation Learning [40.78975699391065]
In continual learning (CL), the goal is to design models that can learn a sequence of tasks without catastrophic forgetting. We study the problem of continual representation learning where we learn an evolving representation as new tasks arrive. We show that CL benefits if the initial tasks have large sample size and high "representation diversity"
arXiv Detail & Related papers (2022-03-03T21:23:08Z)
RL-CycleGAN: Reinforcement Learning Aware Simulation-To-Real [74.45688231140689]
We introduce the RL-scene consistency loss for image translation, which ensures that the translation operation is invariant with respect to the Q-values associated with the image. We obtain RL-CycleGAN, a new approach for simulation-to-real-world transfer for reinforcement learning.
arXiv Detail & Related papers (2020-06-16T08:58:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.