Related papers: Mixture of Experts Using Tensor Products

Mixture of Experts Using Tensor Products

URL: http://arxiv.org/abs/2405.16671v1
Date: Sun, 26 May 2024 19:25:08 GMT
Title: Mixture of Experts Using Tensor Products
Authors: Zhan Su, Fengran Mo, Prayag Tiwari, Benyou Wang, Jian-Yun Nie, Jakob Grue Simonsen,
Abstract summary: In multi-task learning, the conventional approach involves training a model on multiple tasks simultaneously. We investigate if modular language models can facilitate positive transfer and systematic generalization. Specifically, we propose a novel modular language model (textttTensorPoly) that balances parameter efficiency with nuanced routing methods.
Score: 44.816454454687
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In multi-task learning, the conventional approach involves training a model on multiple tasks simultaneously. However, the training signals from different tasks can interfere with one another, potentially leading to \textit{negative transfer}. To mitigate this, we investigate if modular language models can facilitate positive transfer and systematic generalization. Specifically, we propose a novel modular language model (\texttt{TensorPoly}), that balances parameter efficiency with nuanced routing methods. For \textit{modules}, we reparameterize Low-Rank Adaptation (\texttt{LoRA}) by employing an entangled tensor through the use of tensor product operations and name the resulting approach \texttt{TLoRA}. For \textit{routing function}, we tailor two innovative routing functions according to the granularity: \texttt{TensorPoly-I} which directs to each rank within the entangled tensor while \texttt{TensorPoly-II} offers a finer-grained routing approach targeting each order of the entangled tensor. The experimental results from the multi-task T0-benchmark demonstrate that: 1) all modular LMs surpass the corresponding dense approaches, highlighting the potential of modular language models to mitigate negative inference in multi-task learning and deliver superior outcomes. 2) \texttt{TensorPoly-I} achieves higher parameter efficiency in adaptation and outperforms other modular LMs, which shows the potential of our approach in multi-task transfer learning.

Related papers

Learning to Chain Operations by Routing Information Through a Global Workspace [3.1614158472531435]
We present a model inspired by the Global Workspace Theory that integrates specialized modules to perform a sequential reasoning task. We evaluate the model's performance on a simple addition task, where two addends must be summed. Our results highlight the potential of architectures inspired by the Global Workspace Theory to enhance deep learning's reasoning capabilities.
arXiv Detail & Related papers (2025-02-28T15:30:55Z)
When Parameter-efficient Tuning Meets General-purpose Vision-language Models [65.19127815275307]
PETAL revolutionizes the training process by requiring only 0.5% of the total parameters, achieved through a unique mode approximation technique. Our experiments reveal that PETAL not only outperforms current state-of-the-art methods in most scenarios but also surpasses full fine-tuning models in effectiveness.
arXiv Detail & Related papers (2023-12-16T17:13:08Z)
Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large Language Models for Dynamic Inference [32.62084449979531]
We extend SortedNet to generative NLP tasks by replacing Standard Fine-Tuning (SFT) with Sorted Fine-Tuning (SoFT) Our approach boosts model efficiency, eliminating the need for multiple models for various scenarios during inference. Our results show the superior performance of sub-models in comparison to Standard Fine-Tuning and SFT+ICT (Early-Exit)
arXiv Detail & Related papers (2023-09-16T11:58:34Z)
Low-Rank Multitask Learning based on Tensorized SVMs and LSSVMs [65.42104819071444]
Multitask learning (MTL) leverages task-relatedness to enhance performance. We employ high-order tensors, with each mode corresponding to a task index, to naturally represent tasks referenced by multiple indices. We propose a general framework of low-rank MTL methods with tensorized support vector machines (SVMs) and least square support vector machines (LSSVMs)
arXiv Detail & Related papers (2023-08-30T14:28:26Z)
On Conditional and Compositional Language Model Differentiable Prompting [75.76546041094436]
Prompts have been shown to be an effective method to adapt a frozen Pretrained Language Model (PLM) to perform well on downstream tasks. We propose a new model, Prompt Production System (PRopS), which learns to transform task instructions or input metadata, into continuous prompts.
arXiv Detail & Related papers (2023-07-04T02:47:42Z)
Composing Parameter-Efficient Modules with Arithmetic Operations [20.119291936493788]
We propose to compose parameter-efficient modules through linear arithmetic operations in the weight space. Our approach requires emphno additional training and enables highly flexible module composition. We extend our approach to detoxify Alpaca-LoRA, the latest instruction-tuned large language model based on LLaMA.
arXiv Detail & Related papers (2023-06-26T17:33:21Z)
Tensorized LSSVMs for Multitask Regression [48.844191210894245]
Multitask learning (MTL) can utilize the relatedness between multiple tasks for performance improvement. New MTL is proposed by leveraging low-rank tensor analysis and Least Squares Support Vectorized Least Squares Support Vectorized tLSSVM-MTL.
arXiv Detail & Related papers (2023-03-04T16:36:03Z)
Multimodal Sequential Generative Models for Semi-Supervised Language Instruction Following [26.386772715777223]
This paper proposes using multimodal generative models for semi-supervised learning in the instruction following tasks. The models learn a shared representation of the paired data, and enable semi-supervised learning by reconstructing unpaired data. Experiments on BabyAI and Room-to-Room environments show that the proposed method improves the performance of instruction following by leveraging unpaired data.
arXiv Detail & Related papers (2022-12-29T03:23:43Z)
UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers [108.92194081987967]
We make the first attempt to explore a universal multi-agent reinforcement learning pipeline, designing one single architecture to fit tasks. Unlike previous RNN-based models, we utilize a transformer-based model to generate a flexible policy. The proposed model, named as Universal Policy Decoupling Transformer (UPDeT), further relaxes the action restriction and makes the multi-agent task's decision process more explainable.
arXiv Detail & Related papers (2021-01-20T07:24:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.