On the Complementarity of Data Selection and Fine Tuning for Domain
Adaptation
- URL: http://arxiv.org/abs/2109.07591v1
- Date: Wed, 15 Sep 2021 21:49:06 GMT
- Title: On the Complementarity of Data Selection and Fine Tuning for Domain
Adaptation
- Authors: Dan Iter and David Grangier
- Abstract summary: Domain adaptation of neural networks commonly relies on three training phases: pretraining, selected data training and then fine tuning.
Data selection improves target domain generalization by training further on pretraining data identified by relying on a small sample of target domain data.
- Score: 22.178874891042994
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Domain adaptation of neural networks commonly relies on three training
phases: pretraining, selected data training and then fine tuning. Data
selection improves target domain generalization by training further on
pretraining data identified by relying on a small sample of target domain data.
This work examines the benefit of data selection for language modeling and
machine translation. Our experiments assess the complementarity of selection
with fine tuning and result in practical recommendations: (i) selected data
must be similar to the fine-tuning domain but not so much as to erode the
complementary effect of fine-tuning; (ii) there is a trade-off between
selecting little data for fast but limited progress or much data for slow but
long lasting progress; (iii) data selection can be applied early during
pretraining, with performance gains comparable to long pretraining session;
(iv) data selection from domain classifiers is often more effective than the
popular contrastive data selection method.
Related papers
- Unifying and Optimizing Data Values for Selection via Sequential-Decision-Making [5.755427480127593]
We show that data values applied for selection can be reformulated as a sequential-decision-making problem.
We propose an efficient approximation scheme using learned bipartite graphs as surrogate utility models.
arXiv Detail & Related papers (2025-02-06T23:03:10Z) - Capturing the Temporal Dependence of Training Data Influence [100.91355498124527]
We formalize the concept of trajectory-specific leave-one-out influence, which quantifies the impact of removing a data point during training.
We propose data value embedding, a novel technique enabling efficient approximation of trajectory-specific LOO.
As data value embedding captures training data ordering, it offers valuable insights into model training dynamics.
arXiv Detail & Related papers (2024-12-12T18:28:55Z) - Compute-Constrained Data Selection [77.06528009072967]
We find that many powerful data selection methods are almost never compute-optimal.
For compute-optimal training, we find that perplexity and gradient data selection require training-to-selection model size ratios of 5x and 10x, respectively.
arXiv Detail & Related papers (2024-10-21T17:11:21Z) - A CLIP-Powered Framework for Robust and Generalizable Data Selection [51.46695086779598]
Real-world datasets often contain redundant and noisy data, imposing a negative impact on training efficiency and model performance.
Data selection has shown promise in identifying the most representative samples from the entire dataset.
We propose a novel CLIP-powered data selection framework that leverages multimodal information for more robust and generalizable sample selection.
arXiv Detail & Related papers (2024-10-15T03:00:58Z) - MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models [16.654859430784825]
Current data selection methods, which rely on either hand-crafted rules or larger reference models, are conducted statically and do not capture the evolving data preferences during pretraining.
We introduce model-aware data selection with data influence models (MATES), where a data influence model continuously adapts to the evolving data preferences of the pretraining model and then selects the data most effective for the current pretraining progress.
Experiments of pretraining 410M and 1B models on the C4 dataset demonstrate that MATES significantly outperforms random data selection on extensive downstream tasks.
arXiv Detail & Related papers (2024-06-10T06:27:42Z) - TextGram: Towards a better domain-adaptive pretraining [0.3769303106863454]
In NLP, pre-training involves using a large amount of text data to gain prior knowledge for performing downstream tasks.
We propose our own domain-adaptive data selection method - TextGram.
We show that the proposed strategy works better compared to other selection methods.
arXiv Detail & Related papers (2024-04-28T15:44:57Z) - LESS: Selecting Influential Data for Targeted Instruction Tuning [64.78894228923619]
We propose LESS, an efficient algorithm to estimate data influences and perform Low-rank gradiEnt Similarity Search for instruction data selection.
We show that training on a LESS-selected 5% of the data can often outperform training on the full dataset across diverse downstream tasks.
Our method goes beyond surface form cues to identify data that the necessary reasoning skills for the intended downstream application.
arXiv Detail & Related papers (2024-02-06T19:18:04Z) - Efficient Online Data Mixing For Language Model Pre-Training [101.45242332613944]
Existing data selection methods suffer from slow and computationally expensive processes.
Data mixing, on the other hand, reduces the complexity of data selection by grouping data points together.
We develop an efficient algorithm for Online Data Mixing (ODM) that combines elements from both data selection and data mixing.
arXiv Detail & Related papers (2023-12-05T00:42:35Z) - Analyzing domain shift when using additional data for the MICCAI KiTS23
Challenge [5.745796568988237]
We study techniques which ameliorate domain shift during training so that the additional data becomes better usable for preprocessing and training together with the original data.
Our results show that transforming the additional data using histogram matching has better results than using simple normalization.
arXiv Detail & Related papers (2023-09-05T07:31:22Z) - Improved Fine-tuning by Leveraging Pre-training Data: Theory and
Practice [52.11183787786718]
Fine-tuning a pre-trained model on the target data is widely used in many deep learning applications.
Recent studies have empirically shown that training from scratch has the final performance that is no worse than this pre-training strategy.
We propose a novel selection strategy to select a subset from pre-training data to help improve the generalization on the target task.
arXiv Detail & Related papers (2021-11-24T06:18:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.