Adapt-$\infty$: Scalable Continual Multimodal Instruction Tuning via Dynamic Data Selection
- URL: http://arxiv.org/abs/2410.10636v2
- Date: Mon, 24 Mar 2025 09:17:13 GMT
- Title: Adapt-$\infty$: Scalable Continual Multimodal Instruction Tuning via Dynamic Data Selection
- Authors: Adyasha Maharana, Jaehong Yoon, Tianlong Chen, Mohit Bansal,
- Abstract summary: Adapt-$infty$ is a new multi-way and adaptive data selection approach for lifelong instruction tuning.<n>We construct pseudo-skill clusters by grouping gradient-based sample vectors.<n>We select the best-performing data selector for each skill cluster from a pool of selector experts.<n>This data selector samples a subset of the most important samples from each skill cluster for training.
- Score: 89.42023974249122
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual instruction datasets from various distributors are released at different times and often contain a significant number of semantically redundant text-image pairs, depending on their task compositions (i.e., skills) or reference sources. This redundancy greatly limits the efficient deployment of continually adaptable multimodal large language models, hindering their ability to refine existing skills and acquire new competencies over time. We reframe the problem of lifelong Instruction Tuning (LiIT) via data selection, where the model automatically selects beneficial samples to learn from earlier and new datasets based on the current state of acquired knowledge in the model. We propose Adapt-$\infty$, a new multi-way and adaptive data selection approach that dynamically balances sample efficiency and effectiveness during LiIT. We first construct pseudo-skill clusters by grouping gradient-based sample vectors. Next, we select the best-performing data selector for each skill cluster from a pool of selector experts, including our newly proposed scoring function, Image Grounding score. This data selector samples a subset of the most important samples from each skill cluster for training. To prevent the continuous increase in the size of the dataset pool during LiIT, we introduce a cluster-wise permanent data pruning strategy to remove the most semantically redundant samples from each cluster, keeping computational requirements manageable. We validate the effectiveness and efficiency of Adapt-$\infty$ over a sequence of multimodal instruction tuning datasets with various tasks, including (Knowledge) VQA, multilingual, grounding, reasoning, language-only, and multi-image comprehension. Training with samples selected by Adapt-$\infty$ alleviates catastrophic forgetting, especially for rare tasks, and promotes forward transfer across the continuum using only a fraction of the original data.
Related papers
- A CLIP-Powered Framework for Robust and Generalizable Data Selection [51.46695086779598]
Real-world datasets often contain redundant and noisy data, imposing a negative impact on training efficiency and model performance.
Data selection has shown promise in identifying the most representative samples from the entire dataset.
We propose a novel CLIP-powered data selection framework that leverages multimodal information for more robust and generalizable sample selection.
arXiv Detail & Related papers (2024-10-15T03:00:58Z) - Dataset Quantization with Active Learning based Adaptive Sampling [11.157462442942775]
We show that maintaining performance is feasible even with uneven sample distributions.
We propose a novel active learning based adaptive sampling strategy to optimize the sample selection.
Our approach outperforms the state-of-the-art dataset compression methods.
arXiv Detail & Related papers (2024-07-09T23:09:18Z) - Multi-Teacher Multi-Objective Meta-Learning for Zero-Shot Hyperspectral Band Selection [50.30291173608449]
We propose a novel multi-objective meta-learning network (M$3$BS) for zero-shot hyperspectral band selection.
In M$3$BS, a generalizable graph convolution network (GCN) is constructed to generate dataset-agnostic base.
The acquired meta-knowledge can be directly transferred to unseen datasets without any retraining or fine-tuning.
arXiv Detail & Related papers (2024-06-12T07:13:31Z) - LESS: Selecting Influential Data for Targeted Instruction Tuning [64.78894228923619]
We propose LESS, an efficient algorithm to estimate data influences and perform Low-rank gradiEnt Similarity Search for instruction data selection.
We show that training on a LESS-selected 5% of the data can often outperform training on the full dataset across diverse downstream tasks.
Our method goes beyond surface form cues to identify data that the necessary reasoning skills for the intended downstream application.
arXiv Detail & Related papers (2024-02-06T19:18:04Z) - GistScore: Learning Better Representations for In-Context Example
Selection with Gist Bottlenecks [3.9638110494107095]
In-context Learning (ICL) is the ability of Large Language Models (LLMs) to perform new tasks when conditioned on prompts.
We propose Example Gisting, a novel approach for training example encoders through supervised fine-tuning.
We show that our fine-tuned models get state-of-the-art ICL performance with over 20% absolute gain over off-the-shelf retrievers.
arXiv Detail & Related papers (2023-11-16T06:28:05Z) - Self-Evolved Diverse Data Sampling for Efficient Instruction Tuning [47.02160072880698]
We introduce a self-evolving mechanism that allows the model itself to actively sample subsets that are equally or even more effective.
The key to our data sampling technique lies in the enhancement of diversity in the chosen subsets.
Extensive experiments across three datasets and benchmarks demonstrate the effectiveness of DiverseEvol.
arXiv Detail & Related papers (2023-11-14T14:10:40Z) - Towards Free Data Selection with General-Purpose Models [71.92151210413374]
A desirable data selection algorithm can efficiently choose the most informative samples to maximize the utility of limited annotation budgets.
Current approaches, represented by active learning methods, typically follow a cumbersome pipeline that iterates the time-consuming model training and batch data selection repeatedly.
FreeSel bypasses the heavy batch selection process, achieving a significant improvement in efficiency and being 530x faster than existing active learning methods.
arXiv Detail & Related papers (2023-09-29T15:50:14Z) - Towards General and Efficient Active Learning [20.888364610175987]
Active learning aims to select the most informative samples to exploit limited annotation budgets.
We propose a novel general and efficient active learning (GEAL) method in this paper.
Our method can conduct data selection processes on different datasets with a single-pass inference of the same model.
arXiv Detail & Related papers (2021-12-15T08:35:28Z) - Single-dataset Experts for Multi-dataset Question Answering [6.092171111087768]
We train a network on multiple datasets to generalize and transfer better to new datasets.
Our approach is to model multi-dataset question answering with a collection of single-dataset experts.
Simple methods based on parameter-averaging lead to better zero-shot generalization and few-shot transfer performance.
arXiv Detail & Related papers (2021-09-28T17:08:22Z) - Improving Multi-Turn Response Selection Models with Complementary
Last-Utterance Selection by Instance Weighting [84.9716460244444]
We consider utilizing the underlying correlation in the data resource itself to derive different kinds of supervision signals.
We conduct extensive experiments in two public datasets and obtain significant improvement in both datasets.
arXiv Detail & Related papers (2020-02-18T06:29:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.