Related papers: Playing Lottery Tickets with Vision and Language

Playing Lottery Tickets with Vision and Language

URL: http://arxiv.org/abs/2104.11832v1
Date: Fri, 23 Apr 2021 22:24:33 GMT
Title: Playing Lottery Tickets with Vision and Language
Authors: Zhe Gan, Yen-Chun Chen, Linjie Li, Tianlong Chen, Yu Cheng, Shuohang Wang, Jingjing Liu
Abstract summary: Large-scale transformer-based pre-training has revolutionized vision-and-language (V+L) research. In parallel, work on the lottery ticket hypothesis has shown that deep neural networks contain small matchingworks that can achieve on par or even better performance than the dense networks when trained in isolation. We use UNITER, one of the best-performing V+L models, as the testbed, and consolidate 7 representative V+L tasks for experiments.
Score: 62.6420670250559
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Large-scale transformer-based pre-training has recently revolutionized vision-and-language (V+L) research. Models such as LXMERT, ViLBERT and UNITER have significantly lifted the state of the art over a wide range of V+L tasks. However, the large number of parameters in such models hinders their application in practice. In parallel, work on the lottery ticket hypothesis has shown that deep neural networks contain small matching subnetworks that can achieve on par or even better performance than the dense networks when trained in isolation. In this work, we perform the first empirical study to assess whether such trainable subnetworks also exist in pre-trained V+L models. We use UNITER, one of the best-performing V+L models, as the testbed, and consolidate 7 representative V+L tasks for experiments, including visual question answering, visual commonsense reasoning, visual entailment, referring expression comprehension, image-text retrieval, GQA, and NLVR$^2$. Through comprehensive analysis, we summarize our main findings as follows. ($i$) It is difficult to find subnetworks (i.e., the tickets) that strictly match the performance of the full UNITER model. However, it is encouraging to confirm that we can find "relaxed" winning tickets at 50%-70% sparsity that maintain 99% of the full accuracy. ($ii$) Subnetworks found by task-specific pruning transfer reasonably well to the other tasks, while those found on the pre-training tasks at 60%/70% sparsity transfer universally, matching 98%/96% of the full accuracy on average over all the tasks. ($iii$) Adversarial training can be further used to enhance the performance of the found lottery tickets.

Related papers

EfficientLLaVA:Generalizable Auto-Pruning for Large Vision-language Models [64.18350535770357]
We propose an automatic pruning method for large vision-language models to enhance the efficiency of multimodal reasoning. Our approach only leverages a small number of samples to search for the desired pruning policy. We conduct extensive experiments on the ScienceQA, Vizwiz, MM-vet, and LLaVA-Bench datasets for the task of visual question answering.
arXiv Detail & Related papers (2025-03-19T16:07:04Z)
LXMERT Model Compression for Visual Question Answering [0.03749861135832073]
We show that LXMERT can be effectively pruned by 40%-60% in size with 3% loss in accuracy. Our experiment results demonstrate that LXMERT can be effectively pruned by 40%-60% in size with 3% loss in accuracy.
arXiv Detail & Related papers (2023-10-23T19:46:41Z)
Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks [69.38572074372392]
We present the first results proving that feature learning occurs during training with a nonlinear model on multiple tasks. Our key insight is that multi-task pretraining induces a pseudo-contrastive loss that favors representations that align points that typically have the same label across tasks.
arXiv Detail & Related papers (2023-07-13T16:39:08Z)
Data-Efficient Double-Win Lottery Tickets from Robust Pre-training [129.85939347733387]
We introduce Double-Win Lottery Tickets, in which a subnetwork from a pre-trained model can be independently transferred on diverse downstream tasks. We find that robust pre-training tends to craft sparser double-win lottery tickets with superior performance over the standard counterparts.
arXiv Detail & Related papers (2022-06-09T20:52:50Z)
Dual Lottery Ticket Hypothesis [71.95937879869334]
Lottery Ticket Hypothesis (LTH) provides a novel view to investigate sparse network training and maintain its capacity. In this work, we regard the winning ticket from LTH as the subnetwork which is in trainable condition and its performance as our benchmark. We propose a simple sparse network training strategy, Random Sparse Network Transformation (RST), to substantiate our DLTH.
arXiv Detail & Related papers (2022-03-08T18:06:26Z)
TraVLR: Now You See It, Now You Don't! A Bimodal Dataset for Evaluating Visio-Linguistic Reasoning [25.520406167426135]
We present TraVLR, a synthetic dataset comprising four visio-linguistic (V+L) reasoning tasks. Each example in TraVLR redundantly encodes the scene in two modalities, allowing either to be dropped or added during training or testing without losing relevant information. We compare the performance of four state-of-the-art V+L models, finding that while they perform well on test examples from the same modality, they all fail at cross-modal transfer.
arXiv Detail & Related papers (2021-11-21T07:22:44Z)
The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models [115.49214555402567]
Pre-trained weights often boost a wide range of downstream tasks including classification, detection, and segmentation. Recent studies suggest that pre-training benefits from gigantic model capacity. In this paper, we examine supervised and self-supervised pre-trained models through the lens of the lottery ticket hypothesis (LTH)
arXiv Detail & Related papers (2020-12-12T21:53:55Z)
The Lottery Ticket Hypothesis for Object Recognition [39.186511997089575]
Lottery Ticket Hypothesis states that deep networks trained on large datasets contain smaller neuralworks that achieve on par performance as the dense networks. We provide guidance on how to find lottery tickets with up to 80% overall sparsity on different sub-tasks without incurring any drop in performance.
arXiv Detail & Related papers (2020-12-08T18:59:13Z)
The Lottery Ticket Hypothesis for Pre-trained BERT Networks [137.99328302234338]
In natural language processing (NLP), enormous pre-trained models like BERT have become the standard starting point for training. In parallel, work on the lottery ticket hypothesis has shown that models for NLP and computer vision contain smaller matchingworks capable of training in isolation to full accuracy. We combine these observations to assess whether such trainable, transferrableworks exist in pre-trained BERT models.
arXiv Detail & Related papers (2020-07-23T19:35:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.