Towards Parameter-Efficient Integration of Pre-Trained Language Models
In Temporal Video Grounding
- URL: http://arxiv.org/abs/2209.13359v2
- Date: Thu, 25 May 2023 08:50:45 GMT
- Title: Towards Parameter-Efficient Integration of Pre-Trained Language Models
In Temporal Video Grounding
- Authors: Erica K. Shimomoto, Edison Marrese-Taylor, Hiroya Takamura, Ichiro
Kobayashi, Hideki Nakayama, Yusuke Miyao
- Abstract summary: This paper explores the task of Temporal Video Grounding (TVG)
TVG is where, given an untrimmed video and a natural language sentence query, the goal is to recognize and determine temporal boundaries of action instances in the video.
Recent works tackled this task by improving query inputs with large pre-trained language models (PLM) at the cost of more expensive training.
- Score: 37.199310579532884
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper explores the task of Temporal Video Grounding (TVG) where, given
an untrimmed video and a natural language sentence query, the goal is to
recognize and determine temporal boundaries of action instances in the video
described by the query. Recent works tackled this task by improving query
inputs with large pre-trained language models (PLM) at the cost of more
expensive training. However, the effects of this integration are unclear, as
these works also propose improvements in the visual inputs. Therefore, this
paper studies the effects of PLMs in TVG and assesses the applicability of
parameter-efficient training with NLP adapters. We couple popular PLMs with a
selection of existing approaches and test different adapters to reduce the
impact of the additional parameters. Our results on three challenging datasets
show that, without changing the visual inputs, TVG models greatly benefited
from the PLM integration and fine-tuning, stressing the importance of sentence
query representation in this task. Furthermore, NLP adapters were an effective
alternative to full fine-tuning, even though they were not tailored to our
task, allowing PLM integration in larger TVG models and delivering results
comparable to SOTA models. Finally, our results shed light on which adapters
work best in different scenarios.
Related papers
- Train More Parameters But Mind Their Placement: Insights into Language Adaptation with PEFT [0.8702432681310401]
We aim to enhance the generation performance of an LLM by specialising it using unstructured text corpora.
We find that increasing the number of trainable parameters leads to better and more robust language adaptation.
Although improvements are consistent in 0-shot summarisation, some adapted models struggle with longer context lengths.
arXiv Detail & Related papers (2024-12-17T08:44:00Z) - Skip Tuning: Pre-trained Vision-Language Models are Effective and Efficient Adapters Themselves [123.07450481623124]
We propose Skip Tuning as a novel paradigm for adapting vision-language models to downstream tasks.
Unlike existing PT or adapter-based methods, Skip Tuning applies Layer-wise Skipping (LSkip) and Class-wise Skipping (CSkip) upon the FT baseline without introducing extra context vectors or adapter modules.
arXiv Detail & Related papers (2024-12-16T07:33:23Z) - Parameter-Efficient Fine-Tuning With Adapters [5.948206235442328]
This research introduces a novel adaptation method utilizing the UniPELT framework as a base.
Our method employs adapters, which enable efficient transfer of pretrained models to new tasks with minimal retraining of the base model parameters.
arXiv Detail & Related papers (2024-05-09T01:40:38Z) - Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation [67.13876021157887]
Dynamic Tuning (DyT) is a novel approach to improve both parameter and inference efficiency for ViT adaptation.
DyT achieves superior performance compared to existing PEFT methods while evoking only 71% of their FLOPs on the VTAB-1K benchmark.
arXiv Detail & Related papers (2024-03-18T14:05:52Z) - p-Laplacian Adaptation for Generative Pre-trained Vision-Language Models [10.713680139939354]
Vision-Language models (VLMs) pre-trained on large corpora have demonstrated notable success across a range of downstream tasks.
PETL has garnered attention as a viable alternative to full fine-tuning.
We propose a new adapter architecture, $p$-adapter, which employs $p$-Laplacian message passing in Graph Neural Networks (GNNs)
arXiv Detail & Related papers (2023-12-17T05:30:35Z) - PEMA: An Offsite-Tunable Plug-in External Memory Adaptation for Language Models [6.622419351156256]
Pre-trained language models (PLMs) show impressive performance in various downstream NLP tasks.
Due to the substantial resources required, many PLM weights are confidential.
We introduce Plug-in External Memory Adaptation (PEMA), a.
Efficient Fine-Tuning (PEFT) method, enabling fine-tuning without requiring access to all the weights.
arXiv Detail & Related papers (2023-11-14T23:20:51Z) - AdapterEM: Pre-trained Language Model Adaptation for Generalized Entity
Matching using Adapter-tuning [3.4754314910585626]
We propose a parameter-efficient paradigm for fine-tuning PrLMs based on adapters.
We show that our solution achieves comparable or superior performance to full-scale PrLM fine-tuning and prompt-tuning baselines.
arXiv Detail & Related papers (2023-05-30T04:03:23Z) - LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of
Large Language Models [75.25782573728677]
This paper presents a framework for adapter-based parameter-efficient fine-tuning (PEFT) of language models (LLMs)
The framework includes state-of-the-art open-access LLMs such as LLaMA, BLOOM, and GPT-J, as well as widely used adapters such as Series adapters, Parallel adapter, Prompt-based learning and Reparametrization-based methods.
We evaluate the effectiveness of the adapters on fourteen datasets from two different reasoning tasks, Arithmetic Reasoning and Commonsense Reasoning.
arXiv Detail & Related papers (2023-04-04T16:31:37Z) - Exploring Efficient-tuning Methods in Self-supervised Speech Models [53.633222197712875]
Self-supervised learning can learn powerful representations for different speech tasks.
In downstream tasks, the parameters of SSL models are frozen, and only the adapters are trained.
We show that the performance parity can be achieved with over 90% parameter reduction.
arXiv Detail & Related papers (2022-10-10T11:08:12Z) - Towards a Unified View on Visual Parameter-Efficient Transfer Learning [96.99924127527002]
We propose a framework with a unified view called visual-PETL (V-PETL) to investigate the different aspects affecting the trade-off.
An effective scheme Swin-BAPAT derived from the proposed V-PETL framework achieves significantly better performance than the state-of-the-art AdaptFormer-Swin.
arXiv Detail & Related papers (2022-10-03T09:54:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.