Related papers: The Microsoft Toolkit of Multi-Task Deep Neural Networks for Natural Language Understanding

The Microsoft Toolkit of Multi-Task Deep Neural Networks for Natural Language Understanding

URL: http://arxiv.org/abs/2002.07972v2
Date: Fri, 15 May 2020 21:47:31 GMT
Title: The Microsoft Toolkit of Multi-Task Deep Neural Networks for Natural Language Understanding
Authors: Xiaodong Liu, Yu Wang, Jianshu Ji, Hao Cheng, Xueyun Zhu, Emmanuel Awa, Pengcheng He, Weizhu Chen, Hoifung Poon, Guihong Cao and Jianfeng Gao
Abstract summary: We present MT-DNN, an open-source natural language understanding (NLU) toolkit that makes it easy for researchers and developers to train customized deep learning models. Built upon PyTorch and Transformers, MT-DNN is designed to facilitate rapid customization for a broad spectrum of NLU tasks. A unique feature of MT-DNN is its built-in support for robust and transferable learning using the adversarial multi-task learning paradigm.
Score: 97.85957811603251
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present MT-DNN, an open-source natural language understanding (NLU) toolkit that makes it easy for researchers and developers to train customized deep learning models. Built upon PyTorch and Transformers, MT-DNN is designed to facilitate rapid customization for a broad spectrum of NLU tasks, using a variety of objectives (classification, regression, structured prediction) and text encoders (e.g., RNNs, BERT, RoBERTa, UniLM). A unique feature of MT-DNN is its built-in support for robust and transferable learning using the adversarial multi-task learning paradigm. To enable efficient production deployment, MT-DNN supports multi-task knowledge distillation, which can substantially compress a deep neural model without significant performance drop. We demonstrate the effectiveness of MT-DNN on a wide range of NLU applications across general and biomedical domains. The software and pre-trained models will be publicly available at https://github.com/namisan/mt-dnn.

Related papers

Scalable Mechanistic Neural Networks [52.28945097811129]
We propose an enhanced neural network framework designed for scientific machine learning applications involving long temporal sequences. By reformulating the original Mechanistic Neural Network (MNN) we reduce the computational time and space complexities from cubic and quadratic with respect to the sequence length, respectively, to linear. Extensive experiments demonstrate that S-MNN matches the original MNN in precision while substantially reducing computational resources.
arXiv Detail & Related papers (2024-10-08T14:27:28Z)
NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals [58.83169560132308]
We introduce NNsight and NDIF, technologies that work in tandem to enable scientific study of the representations and computations learned by very large neural networks.
arXiv Detail & Related papers (2024-07-18T17:59:01Z)
Learning Universal Predictors [23.18743879588599]
We explore the potential of amortizing the most powerful universal predictor, namely Solomonoff Induction (SI), into neural networks via leveraging meta-learning to its limits. We use Universal Turing Machines (UTMs) to generate training data used to expose networks to a broad range of patterns. Our results suggest that UTM data is a valuable resource for meta-learning, and that it can be used to train neural networks capable of learning universal prediction strategies.
arXiv Detail & Related papers (2024-01-26T15:37:16Z)
NExT-GPT: Any-to-Any Multimodal LLM [75.5656492989924]
We present an end-to-end general-purpose any-to-any MM-LLM system, NExT-GPT. We connect an LLM with multimodal adaptors and different diffusion decoders, enabling NExT-GPT to perceive inputs and generate outputs in arbitrary combinations of text, images, videos, and audio. We introduce a modality-switching instruction tuning (MosIT) and manually curate a high-quality dataset for MosIT, based on which NExT-GPT is empowered with complex cross-modal semantic understanding and content generation.
arXiv Detail & Related papers (2023-09-11T15:02:25Z)
Deformable Mixer Transformer with Gating for Multi-Task Learning of Dense Prediction [126.34551436845133]
CNNs and Transformers have their own advantages and both have been widely used for dense prediction in multi-task learning (MTL) We present a novel MTL model by combining both merits of deformable CNN and query-based Transformer with shared gating for multi-task learning of dense prediction.
arXiv Detail & Related papers (2023-08-10T17:37:49Z)
IGNNITION: Bridging the Gap Between Graph Neural Networks and Networking Systems [4.1591055164123665]
We present IGNNITION, a novel open-source framework that enables fast prototyping of Graph Neural Networks (GNNs) for networking systems. IGNNITION is based on an intuitive high-level abstraction that hides the complexity behind GNNs. Our results show that the GNN models produced by IGNNITION are equivalent in terms of accuracy and performance to their native implementations.
arXiv Detail & Related papers (2021-09-14T14:28:21Z)
A Microarchitecture Implementation Framework for Online Learning with Temporal Neural Networks [1.4530235554268331]
Temporal Neural Networks (TNNs) are spiking neural networks that use time as a resource to represent and process information. This work proposes a microarchitecture framework for implementing TNNs using standard CMOS.
arXiv Detail & Related papers (2021-05-27T15:59:54Z)
Fast On-Device Adaptation for Spiking Neural Networks via Online-Within-Online Meta-Learning [31.78005607111787]
Spiking Neural Networks (SNNs) have recently gained popularity as machine learning models for on-device edge intelligence. In this paper, we propose an online-within-online meta-learning rule for SNNs termed OWOML-SNN, that enables lifelong learning on a stream of tasks.
arXiv Detail & Related papers (2021-02-21T04:28:49Z)
Deep Time Delay Neural Network for Speech Enhancement with Full Data Learning [60.20150317299749]
This paper proposes a deep time delay neural network (TDNN) for speech enhancement with full data learning. To make full use of the training data, we propose a full data learning method for speech enhancement.
arXiv Detail & Related papers (2020-11-11T06:32:37Z)
Progressive Tandem Learning for Pattern Recognition with Deep Spiking Neural Networks [80.15411508088522]
Spiking neural networks (SNNs) have shown advantages over traditional artificial neural networks (ANNs) for low latency and high computational efficiency. We propose a novel ANN-to-SNN conversion and layer-wise learning framework for rapid and efficient pattern recognition.
arXiv Detail & Related papers (2020-07-02T15:38:44Z)
Dynamic Sparsity Neural Networks for Automatic Speech Recognition [44.352231175123215]
We present Dynamic Sparsity Neural Networks (DSNN) that, once trained, can instantly switch to any predefined sparsity configuration at run-time. Our trained DSNN model, therefore, can greatly ease the training process and simplify deployment in diverse scenarios with resource constraints.
arXiv Detail & Related papers (2020-05-16T22:08:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.