Multi-Faceted Hierarchical Multi-Task Learning for a Large Number of
Tasks with Multi-dimensional Relations
- URL: http://arxiv.org/abs/2110.13365v1
- Date: Tue, 26 Oct 2021 02:35:51 GMT
- Title: Multi-Faceted Hierarchical Multi-Task Learning for a Large Number of
Tasks with Multi-dimensional Relations
- Authors: Junning Liu, Zijie Xia, Yu Lei, Xinjian Li, Xu Wang
- Abstract summary: This work studies the "macro" perspective of shared learning network design and proposes a Multi-Faceted Hierarchical MTL model(MFH)
MFH exploits the multi-dimensional task relations with a nested hierarchical tree structure which maximizes the shared learning.
We evaluate MFH and SOTA models in a large industry video platform of 10 billion samples and results show that MFH outperforms SOTA MTL models significantly in both offline and online evaluations.
- Score: 10.326429525379181
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: There has been many studies on improving the efficiency of shared learning in
Multi-Task Learning(MTL). Previous work focused on the "micro" sharing
perspective for a small number of tasks, while in Recommender Systems(RS) and
other AI applications, there are often demands to model a large number of tasks
with multi-dimensional task relations. For example, when using MTL to model
various user behaviors in RS, if we differentiate new users and new items from
old ones, there will be a cartesian product style increase of tasks with
multi-dimensional relations. This work studies the "macro" perspective of
shared learning network design and proposes a Multi-Faceted Hierarchical MTL
model(MFH). MFH exploits the multi-dimension task relations with a nested
hierarchical tree structure which maximizes the shared learning. We evaluate
MFH and SOTA models in a large industry video platform of 10 billion samples
and results show that MFH outperforms SOTA MTL models significantly in both
offline and online evaluations across all user groups, especially remarkable
for new users with an online increase of 9.1\% in app time per user and 1.85\%
in next-day retention rate. MFH now has been deployed in a large scale online
video recommender system. MFH is especially beneficial to the cold-start
problems in RS where new users and new items often suffer from a "local
overfitting" phenomenon. However, the idea is actually generic and widely
applicable to other MTL scenarios.
Related papers
- An Enhanced-State Reinforcement Learning Algorithm for Multi-Task Fusion in Large-Scale Recommender Systems [12.277443583840963]
We propose a novel method called Enhanced-State RL for Multi-Task Fusion (MTF) in Recommender Systems (RSs)
Our method first defines user features, item features, and other valuable features collectively as the enhanced state; then proposes a novel actor and critic learning process to utilize the enhanced state to make much better action for each user-item pair.
arXiv Detail & Related papers (2024-09-18T03:34:31Z) - Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts [54.529880848937104]
We develop a unified MLLM with the MoE architecture, named Uni-MoE, that can handle a wide array of modalities.
Specifically, it features modality-specific encoders with connectors for a unified multimodal representation.
We evaluate the instruction-tuned Uni-MoE on a comprehensive set of multimodal datasets.
arXiv Detail & Related papers (2024-05-18T12:16:01Z) - DEPHN: Different Expression Parallel Heterogeneous Network using virtual
gradient optimization for Multi-task Learning [1.0705399532413615]
Recommendation system algorithm based on multi-task learning (MTL) is the major method for Internet operators to understand users and predict their behaviors.
Traditional models use shared-bottom models and gating experts to realize shared representation learning and information differentiation.
We propose an Different Expression Parallel Heterogeneous Network (DEPHN) to model multiple tasks simultaneously.
arXiv Detail & Related papers (2023-07-24T04:29:00Z) - Mod-Squad: Designing Mixture of Experts As Modular Multi-Task Learners [74.92558307689265]
We propose Mod-Squad, a new model that is Modularized into groups of experts (a 'Squad')
We optimize this matching process during the training of a single model.
Experiments on the Taskonomy dataset with 13 vision tasks and the PASCAL-Context dataset with 5 vision tasks show the superiority of our approach.
arXiv Detail & Related papers (2022-12-15T18:59:52Z) - M$^3$ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task
Learning with Model-Accelerator Co-design [95.41238363769892]
Multi-task learning (MTL) encapsulates multiple learned tasks in a single model and often lets those tasks learn better jointly.
Current MTL regimes have to activate nearly the entire model even to just execute a single task.
We present a model-accelerator co-design framework to enable efficient on-device MTL.
arXiv Detail & Related papers (2022-10-26T15:40:24Z) - Sparsely Activated Mixture-of-Experts are Robust Multi-Task Learners [67.5865966762559]
We study whether sparsely activated Mixture-of-Experts (MoE) improve multi-task learning.
We devise task-aware gating functions to route examples from different tasks to specialized experts.
This results in a sparsely activated multi-task model with a large number of parameters, but with the same computational cost as that of a dense model.
arXiv Detail & Related papers (2022-04-16T00:56:12Z) - Exceeding the Limits of Visual-Linguistic Multi-Task Learning [0.0]
We construct 1000 unique classification tasks that share similarly-structured input data.
These classification tasks focus on learning the product hierarchy of different e-commerce websites.
We solve these tasks in unison using multi-task learning (MTL)
arXiv Detail & Related papers (2021-07-27T19:42:14Z) - Rethinking Hard-Parameter Sharing in Multi-Task Learning [20.792654758645302]
Hard parameter sharing in multi-task learning (MTL) allows tasks to share some of model parameters, reducing storage cost and improving prediction accuracy.
The common sharing practice is to share bottom layers of a deep neural network among tasks while using separate top layers for each task.
Using separate bottom-layer parameters could achieve significantly better performance than the common practice.
arXiv Detail & Related papers (2021-07-23T17:26:40Z) - Controllable Pareto Multi-Task Learning [55.945680594691076]
A multi-task learning system aims at solving multiple related tasks at the same time.
With a fixed model capacity, the tasks would be conflicted with each other, and the system usually has to make a trade-off among learning all of them together.
This work proposes a novel controllable multi-task learning framework, to enable the system to make real-time trade-off control among different tasks with a single model.
arXiv Detail & Related papers (2020-10-13T11:53:55Z) - Boosting Share Routing for Multi-task Learning [0.12891210250935145]
Multi-task learning (MTL) aims to make full use of the knowledge contained in multi-task supervision signals to improve the overall performance.
How to make the knowledge of multiple tasks shared appropriately is an open problem for MTL.
We propose a general framework called Multi-Task Neural Architecture Search (MTNAS) to efficiently find a suitable sharing route for a given MTL problem.
arXiv Detail & Related papers (2020-09-01T12:37:19Z) - MTI-Net: Multi-Scale Task Interaction Networks for Multi-Task Learning [82.62433731378455]
We show that tasks with high affinity at a certain scale are not guaranteed to retain this behaviour at other scales.
We propose a novel architecture, namely MTI-Net, that builds upon this finding.
arXiv Detail & Related papers (2020-01-19T21:02:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.