The Microsoft Toolkit of Multi-Task Deep Neural Networks for Natural
Language Understanding
- URL: http://arxiv.org/abs/2002.07972v2
- Date: Fri, 15 May 2020 21:47:31 GMT
- Title: The Microsoft Toolkit of Multi-Task Deep Neural Networks for Natural
Language Understanding
- Authors: Xiaodong Liu, Yu Wang, Jianshu Ji, Hao Cheng, Xueyun Zhu, Emmanuel
Awa, Pengcheng He, Weizhu Chen, Hoifung Poon, Guihong Cao and Jianfeng Gao
- Abstract summary: We present MT-DNN, an open-source natural language understanding (NLU) toolkit that makes it easy for researchers and developers to train customized deep learning models.
Built upon PyTorch and Transformers, MT-DNN is designed to facilitate rapid customization for a broad spectrum of NLU tasks.
A unique feature of MT-DNN is its built-in support for robust and transferable learning using the adversarial multi-task learning paradigm.
- Score: 97.85957811603251
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present MT-DNN, an open-source natural language understanding (NLU)
toolkit that makes it easy for researchers and developers to train customized
deep learning models. Built upon PyTorch and Transformers, MT-DNN is designed
to facilitate rapid customization for a broad spectrum of NLU tasks, using a
variety of objectives (classification, regression, structured prediction) and
text encoders (e.g., RNNs, BERT, RoBERTa, UniLM). A unique feature of MT-DNN is
its built-in support for robust and transferable learning using the adversarial
multi-task learning paradigm. To enable efficient production deployment, MT-DNN
supports multi-task knowledge distillation, which can substantially compress a
deep neural model without significant performance drop. We demonstrate the
effectiveness of MT-DNN on a wide range of NLU applications across general and
biomedical domains. The software and pre-trained models will be publicly
available at https://github.com/namisan/mt-dnn.
Related papers
- Scalable Mechanistic Neural Networks [52.28945097811129]
We propose an enhanced neural network framework designed for scientific machine learning applications involving long temporal sequences.
By reformulating the original Mechanistic Neural Network (MNN) we reduce the computational time and space complexities from cubic and quadratic with respect to the sequence length, respectively, to linear.
Extensive experiments demonstrate that S-MNN matches the original MNN in precision while substantially reducing computational resources.
arXiv Detail & Related papers (2024-10-08T14:27:28Z) - Learning Universal Predictors [23.18743879588599]
We explore the potential of amortizing the most powerful universal predictor, namely Solomonoff Induction (SI), into neural networks via leveraging meta-learning to its limits.
We use Universal Turing Machines (UTMs) to generate training data used to expose networks to a broad range of patterns.
Our results suggest that UTM data is a valuable resource for meta-learning, and that it can be used to train neural networks capable of learning universal prediction strategies.
arXiv Detail & Related papers (2024-01-26T15:37:16Z) - NExT-GPT: Any-to-Any Multimodal LLM [75.5656492989924]
We present an end-to-end general-purpose any-to-any MM-LLM system, NExT-GPT.
We connect an LLM with multimodal adaptors and different diffusion decoders, enabling NExT-GPT to perceive inputs and generate outputs in arbitrary combinations of text, images, videos, and audio.
We introduce a modality-switching instruction tuning (MosIT) and manually curate a high-quality dataset for MosIT, based on which NExT-GPT is empowered with complex cross-modal semantic understanding and content generation.
arXiv Detail & Related papers (2023-09-11T15:02:25Z) - Deformable Mixer Transformer with Gating for Multi-Task Learning of
Dense Prediction [126.34551436845133]
CNNs and Transformers have their own advantages and both have been widely used for dense prediction in multi-task learning (MTL)
We present a novel MTL model by combining both merits of deformable CNN and query-based Transformer with shared gating for multi-task learning of dense prediction.
arXiv Detail & Related papers (2023-08-10T17:37:49Z) - IGNNITION: Bridging the Gap Between Graph Neural Networks and Networking
Systems [4.1591055164123665]
We present IGNNITION, a novel open-source framework that enables fast prototyping of Graph Neural Networks (GNNs) for networking systems.
IGNNITION is based on an intuitive high-level abstraction that hides the complexity behind GNNs.
Our results show that the GNN models produced by IGNNITION are equivalent in terms of accuracy and performance to their native implementations.
arXiv Detail & Related papers (2021-09-14T14:28:21Z) - A Microarchitecture Implementation Framework for Online Learning with
Temporal Neural Networks [1.4530235554268331]
Temporal Neural Networks (TNNs) are spiking neural networks that use time as a resource to represent and process information.
This work proposes a microarchitecture framework for implementing TNNs using standard CMOS.
arXiv Detail & Related papers (2021-05-27T15:59:54Z) - Fast On-Device Adaptation for Spiking Neural Networks via
Online-Within-Online Meta-Learning [31.78005607111787]
Spiking Neural Networks (SNNs) have recently gained popularity as machine learning models for on-device edge intelligence.
In this paper, we propose an online-within-online meta-learning rule for SNNs termed OWOML-SNN, that enables lifelong learning on a stream of tasks.
arXiv Detail & Related papers (2021-02-21T04:28:49Z) - Deep Time Delay Neural Network for Speech Enhancement with Full Data
Learning [60.20150317299749]
This paper proposes a deep time delay neural network (TDNN) for speech enhancement with full data learning.
To make full use of the training data, we propose a full data learning method for speech enhancement.
arXiv Detail & Related papers (2020-11-11T06:32:37Z) - Progressive Tandem Learning for Pattern Recognition with Deep Spiking
Neural Networks [80.15411508088522]
Spiking neural networks (SNNs) have shown advantages over traditional artificial neural networks (ANNs) for low latency and high computational efficiency.
We propose a novel ANN-to-SNN conversion and layer-wise learning framework for rapid and efficient pattern recognition.
arXiv Detail & Related papers (2020-07-02T15:38:44Z) - Dynamic Sparsity Neural Networks for Automatic Speech Recognition [44.352231175123215]
We present Dynamic Sparsity Neural Networks (DSNN) that, once trained, can instantly switch to any predefined sparsity configuration at run-time.
Our trained DSNN model, therefore, can greatly ease the training process and simplify deployment in diverse scenarios with resource constraints.
arXiv Detail & Related papers (2020-05-16T22:08:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.