Large Dimensional Analysis and Improvement of Multi Task Learning
- URL: http://arxiv.org/abs/2009.01591v1
- Date: Thu, 3 Sep 2020 11:40:14 GMT
- Title: Large Dimensional Analysis and Improvement of Multi Task Learning
- Authors: Malik Tiomoko, Romain Couillet and Hafiz Tiomoko
- Abstract summary: Multi Task Learning (MTL) efficiently leverages useful information contained in multiple related tasks to help improve the generalization performance of all tasks.
This article conducts a large dimensional analysis of a simple but, as we shall see, extremely powerful when carefully tuned, Least Square Support Vector Machine (LSSVM) version of MTL.
- Score: 38.86699890656948
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi Task Learning (MTL) efficiently leverages useful information contained
in multiple related tasks to help improve the generalization performance of all
tasks. This article conducts a large dimensional analysis of a simple but, as
we shall see, extremely powerful when carefully tuned, Least Square Support
Vector Machine (LSSVM) version of MTL, in the regime where the dimension $p$ of
the data and their number $n$ grow large at the same rate.
Under mild assumptions on the input data, the theoretical analysis of the
MTL-LSSVM algorithm first reveals the "sufficient statistics" exploited by the
algorithm and their interaction at work. These results demonstrate, as a
striking consequence, that the standard approach to MTL-LSSVM is largely
suboptimal, can lead to severe effects of negative transfer but that these
impairments are easily corrected. These corrections are turned into an improved
MTL-LSSVM algorithm which can only benefit from additional data, and the
theoretical performance of which is also analyzed.
As evidenced and theoretically sustained in numerous recent works, these
large dimensional results are robust to broad ranges of data distributions,
which our present experiments corroborate. Specifically, the article reports a
systematically close behavior between theoretical and empirical performances on
popular datasets, which is strongly suggestive of the applicability of the
proposed carefully tuned MTL-LSSVM method to real data. This fine-tuning is
fully based on the theoretical analysis and does not in particular require any
cross validation procedure. Besides, the reported performances on real datasets
almost systematically outperform much more elaborate and less intuitive
state-of-the-art multi-task and transfer learning methods.
Related papers
- Mixing It Up: The Cocktail Effect of Multi-Task Fine-Tuning on LLM Performance -- A Case Study in Finance [0.32985979395737774]
We study the application of large language models (LLMs) in domain-specific contexts, including finance.
We find that fine-tuning exclusively on the target task is not always the most effective strategy.
Instead, multi-task fine-tuning can significantly enhance performance.
arXiv Detail & Related papers (2024-10-01T22:35:56Z) - Interpreting and Improving Large Language Models in Arithmetic Calculation [72.19753146621429]
Large language models (LLMs) have demonstrated remarkable potential across numerous applications.
In this work, we delve into uncovering a specific mechanism by which LLMs execute calculations.
We investigate the potential benefits of selectively fine-tuning these essential heads/MLPs to boost the LLMs' computational performance.
arXiv Detail & Related papers (2024-09-03T07:01:46Z) - Characterization of Large Language Model Development in the Datacenter [55.9909258342639]
Large Language Models (LLMs) have presented impressive performance across several transformative tasks.
However, it is non-trivial to efficiently utilize large-scale cluster resources to develop LLMs.
We present an in-depth characterization study of a six-month LLM development workload trace collected from our GPU datacenter Acme.
arXiv Detail & Related papers (2024-03-12T13:31:14Z) - Robust Analysis of Multi-Task Learning Efficiency: New Benchmarks on Light-Weighed Backbones and Effective Measurement of Multi-Task Learning Challenges by Feature Disentanglement [69.51496713076253]
In this paper, we focus on the aforementioned efficiency aspects of existing MTL methods.
We first carry out large-scale experiments of the methods with smaller backbones and on a the MetaGraspNet dataset as a new test ground.
We also propose Feature Disentanglement measure as a novel and efficient identifier of the challenges in MTL.
arXiv Detail & Related papers (2024-02-05T22:15:55Z) - Semantic-Preserving Feature Partitioning for Multi-View Ensemble
Learning [11.415864885658435]
We introduce the Semantic-Preserving Feature Partitioning (SPFP) algorithm, a novel method grounded in information theory.
The SPFP algorithm effectively partitions datasets into multiple semantically consistent views, enhancing the multi-view ensemble learning process.
It maintains model accuracy while significantly improving uncertainty measures in scenarios where high generalization performance is achievable.
arXiv Detail & Related papers (2024-01-11T20:44:45Z) - Low-Rank Multitask Learning based on Tensorized SVMs and LSSVMs [65.42104819071444]
Multitask learning (MTL) leverages task-relatedness to enhance performance.
We employ high-order tensors, with each mode corresponding to a task index, to naturally represent tasks referenced by multiple indices.
We propose a general framework of low-rank MTL methods with tensorized support vector machines (SVMs) and least square support vector machines (LSSVMs)
arXiv Detail & Related papers (2023-08-30T14:28:26Z) - Scaling Relationship on Learning Mathematical Reasoning with Large
Language Models [75.29595679428105]
We investigate how the pre-training loss, supervised data amount, and augmented data amount influence the reasoning performances of a supervised LLM.
We find that rejection samples from multiple models push LLaMA-7B to an accuracy of 49.3% on GSM8K which outperforms the supervised fine-tuning (SFT) accuracy of 35.9% significantly.
arXiv Detail & Related papers (2023-08-03T15:34:01Z) - When to Use Multi-Task Learning vs Intermediate Fine-Tuning for
Pre-Trained Encoder Transfer Learning [15.39115079099451]
Transfer learning (TL) in natural language processing has seen a surge of interest in recent years.
Three main strategies have emerged for making use of multiple supervised datasets during fine-tuning.
We compare all three TL methods in a comprehensive analysis on the GLUE dataset suite.
arXiv Detail & Related papers (2022-05-17T06:48:45Z) - PCA-based Multi Task Learning: a Random Matrix Approach [40.49988553835459]
The article proposes and theoretically analyses a emphcomputationally efficient multi-task learning (MTL) extension of popular principal component analysis (PCA)-based supervised learning schemes citebarshan2011supervised,bair2006prediction.
arXiv Detail & Related papers (2021-11-01T13:13:38Z) - SLAW: Scaled Loss Approximate Weighting for Efficient Multi-Task
Learning [0.0]
Multi-task learning (MTL) is a subfield of machine learning with important applications.
The best MTL optimization methods require individually computing the gradient of each task's loss function.
We propose Scaled Loss Approximate Weighting (SLAW), a method for multi-task optimization that matches the performance of the best existing methods while being much more efficient.
arXiv Detail & Related papers (2021-09-16T20:58:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.