Related papers: Large Dimensional Analysis and Improvement of Multi Task Learning

Large Dimensional Analysis and Improvement of Multi Task Learning

URL: http://arxiv.org/abs/2009.01591v1
Date: Thu, 3 Sep 2020 11:40:14 GMT
Title: Large Dimensional Analysis and Improvement of Multi Task Learning
Authors: Malik Tiomoko, Romain Couillet and Hafiz Tiomoko
Abstract summary: Multi Task Learning (MTL) efficiently leverages useful information contained in multiple related tasks to help improve the generalization performance of all tasks. This article conducts a large dimensional analysis of a simple but, as we shall see, extremely powerful when carefully tuned, Least Square Support Vector Machine (LSSVM) version of MTL.
Score: 38.86699890656948
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multi Task Learning (MTL) efficiently leverages useful information contained in multiple related tasks to help improve the generalization performance of all tasks. This article conducts a large dimensional analysis of a simple but, as we shall see, extremely powerful when carefully tuned, Least Square Support Vector Machine (LSSVM) version of MTL, in the regime where the dimension $p$ of the data and their number $n$ grow large at the same rate. Under mild assumptions on the input data, the theoretical analysis of the MTL-LSSVM algorithm first reveals the "sufficient statistics" exploited by the algorithm and their interaction at work. These results demonstrate, as a striking consequence, that the standard approach to MTL-LSSVM is largely suboptimal, can lead to severe effects of negative transfer but that these impairments are easily corrected. These corrections are turned into an improved MTL-LSSVM algorithm which can only benefit from additional data, and the theoretical performance of which is also analyzed. As evidenced and theoretically sustained in numerous recent works, these large dimensional results are robust to broad ranges of data distributions, which our present experiments corroborate. Specifically, the article reports a systematically close behavior between theoretical and empirical performances on popular datasets, which is strongly suggestive of the applicability of the proposed carefully tuned MTL-LSSVM method to real data. This fine-tuning is fully based on the theoretical analysis and does not in particular require any cross validation procedure. Besides, the reported performances on real datasets almost systematically outperform much more elaborate and less intuitive state-of-the-art multi-task and transfer learning methods.

Related papers

Clear Minds Think Alike: What Makes LLM Fine-tuning Robust? A Study of Token Perplexity [61.48338027901318]
We show that fine-tuning with LLM-generated data improves target task performance and reduces out-of-domain degradation. This is the first mechanistic explanation for the superior OOD robustness conferred by LLM-generated training data.
arXiv Detail & Related papers (2025-01-24T08:18:56Z)
Mixing It Up: The Cocktail Effect of Multi-Task Fine-Tuning on LLM Performance -- A Case Study in Finance [0.32985979395737774]
We study the application of large language models (LLMs) in domain-specific contexts, including finance. We find that fine-tuning exclusively on the target task is not always the most effective strategy. Instead, multi-task fine-tuning can significantly enhance performance.
arXiv Detail & Related papers (2024-10-01T22:35:56Z)
Interpreting and Improving Large Language Models in Arithmetic Calculation [72.19753146621429]
Large language models (LLMs) have demonstrated remarkable potential across numerous applications. In this work, we delve into uncovering a specific mechanism by which LLMs execute calculations. We investigate the potential benefits of selectively fine-tuning these essential heads/MLPs to boost the LLMs' computational performance.
arXiv Detail & Related papers (2024-09-03T07:01:46Z)
Characterization of Large Language Model Development in the Datacenter [55.9909258342639]
Large Language Models (LLMs) have presented impressive performance across several transformative tasks. However, it is non-trivial to efficiently utilize large-scale cluster resources to develop LLMs. We present an in-depth characterization study of a six-month LLM development workload trace collected from our GPU datacenter Acme.
arXiv Detail & Related papers (2024-03-12T13:31:14Z)
Robust Analysis of Multi-Task Learning Efficiency: New Benchmarks on Light-Weighed Backbones and Effective Measurement of Multi-Task Learning Challenges by Feature Disentanglement [69.51496713076253]
In this paper, we focus on the aforementioned efficiency aspects of existing MTL methods. We first carry out large-scale experiments of the methods with smaller backbones and on a the MetaGraspNet dataset as a new test ground. We also propose Feature Disentanglement measure as a novel and efficient identifier of the challenges in MTL.
arXiv Detail & Related papers (2024-02-05T22:15:55Z)
Semantic-Preserving Feature Partitioning for Multi-View Ensemble Learning [11.415864885658435]
We introduce the Semantic-Preserving Feature Partitioning (SPFP) algorithm, a novel method grounded in information theory. The SPFP algorithm effectively partitions datasets into multiple semantically consistent views, enhancing the multi-view ensemble learning process. It maintains model accuracy while significantly improving uncertainty measures in scenarios where high generalization performance is achievable.
arXiv Detail & Related papers (2024-01-11T20:44:45Z)
Low-Rank Multitask Learning based on Tensorized SVMs and LSSVMs [65.42104819071444]
Multitask learning (MTL) leverages task-relatedness to enhance performance. We employ high-order tensors, with each mode corresponding to a task index, to naturally represent tasks referenced by multiple indices. We propose a general framework of low-rank MTL methods with tensorized support vector machines (SVMs) and least square support vector machines (LSSVMs)
arXiv Detail & Related papers (2023-08-30T14:28:26Z)
Scaling Relationship on Learning Mathematical Reasoning with Large Language Models [75.29595679428105]
We investigate how the pre-training loss, supervised data amount, and augmented data amount influence the reasoning performances of a supervised LLM. We find that rejection samples from multiple models push LLaMA-7B to an accuracy of 49.3% on GSM8K which outperforms the supervised fine-tuning (SFT) accuracy of 35.9% significantly.
arXiv Detail & Related papers (2023-08-03T15:34:01Z)
When to Use Multi-Task Learning vs Intermediate Fine-Tuning for Pre-Trained Encoder Transfer Learning [15.39115079099451]
Transfer learning (TL) in natural language processing has seen a surge of interest in recent years. Three main strategies have emerged for making use of multiple supervised datasets during fine-tuning. We compare all three TL methods in a comprehensive analysis on the GLUE dataset suite.
arXiv Detail & Related papers (2022-05-17T06:48:45Z)
PCA-based Multi Task Learning: a Random Matrix Approach [40.49988553835459]
The article proposes and theoretically analyses a emphcomputationally efficient multi-task learning (MTL) extension of popular principal component analysis (PCA)-based supervised learning schemes citebarshan2011supervised,bair2006prediction.
arXiv Detail & Related papers (2021-11-01T13:13:38Z)
SLAW: Scaled Loss Approximate Weighting for Efficient Multi-Task Learning [0.0]
Multi-task learning (MTL) is a subfield of machine learning with important applications. The best MTL optimization methods require individually computing the gradient of each task's loss function. We propose Scaled Loss Approximate Weighting (SLAW), a method for multi-task optimization that matches the performance of the best existing methods while being much more efficient.
arXiv Detail & Related papers (2021-09-16T20:58:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.