Plug-and-Play Document Modules for Pre-trained Models
- URL: http://arxiv.org/abs/2305.17660v1
- Date: Sun, 28 May 2023 08:01:40 GMT
- Title: Plug-and-Play Document Modules for Pre-trained Models
- Authors: Chaojun Xiao, Zhengyan Zhang, Xu Han, Chi-Min Chan, Yankai Lin,
Zhiyuan Liu, Xiangyang Li, Zhonghua Li, Zhao Cao, Maosong Sun
- Abstract summary: We propose to represent each document as a plug-and-play document module, i.e., a document plugin, for PTMs (PlugD)
By inserting document plugins into the backbone PTM for downstream tasks, we can encode a document one time to handle multiple tasks.
Experiments on 8 datasets of 4 typical NLP tasks show that PlugD enables models to encode documents once and for all across different scenarios.
- Score: 92.9897146991974
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large-scale pre-trained models (PTMs) have been widely used in
document-oriented NLP tasks, such as question answering. However, the
encoding-task coupling requirement results in the repeated encoding of the same
documents for different tasks and queries, which is highly computationally
inefficient. To this end, we target to decouple document encoding from
downstream tasks, and propose to represent each document as a plug-and-play
document module, i.e., a document plugin, for PTMs (PlugD). By inserting
document plugins into the backbone PTM for downstream tasks, we can encode a
document one time to handle multiple tasks, which is more efficient than
conventional encoding-task coupling methods that simultaneously encode
documents and input queries using task-specific encoders. Extensive experiments
on 8 datasets of 4 typical NLP tasks show that PlugD enables models to encode
documents once and for all across different scenarios. Especially, PlugD can
save $69\%$ computational costs while achieving comparable performance to
state-of-the-art encoding-task coupling methods. Additionally, we show that
PlugD can serve as an effective post-processing way to inject knowledge into
task-specific models, improving model performance without any additional model
training.
Related papers
- GRAM: Global Reasoning for Multi-Page VQA [14.980413646626234]
We present GRAM, a method that seamlessly extends pre-trained single-page models to the multi-page setting.
To do so, we leverage a single-page encoder for local page-level understanding, and enhance it with document-level designated layers and learnable tokens.
For additional computational savings during decoding, we introduce an optional compression stage.
arXiv Detail & Related papers (2024-01-07T08:03:06Z) - Diffusion Model is an Effective Planner and Data Synthesizer for
Multi-Task Reinforcement Learning [101.66860222415512]
Multi-Task Diffusion Model (textscMTDiff) is a diffusion-based method that incorporates Transformer backbones and prompt learning for generative planning and data synthesis.
For generative planning, we find textscMTDiff outperforms state-of-the-art algorithms across 50 tasks on Meta-World and 8 maps on Maze2D.
arXiv Detail & Related papers (2023-05-29T05:20:38Z) - A Hierarchical Encoding-Decoding Scheme for Abstractive Multi-document
Summarization [66.08074487429477]
Pre-trained language models (PLMs) have achieved outstanding achievements in abstractive single-document summarization (SDS)
We propose a new method to better utilize a PLM to facilitate multi-document interactions for the multi-document summarization (MDS) task.
Our method outperforms its corresponding PLM backbone by up to 3 Rouge-L and is favored by humans.
arXiv Detail & Related papers (2023-05-15T10:03:31Z) - Improving Cross-task Generalization of Unified Table-to-text Models with
Compositional Task Configurations [63.04466647849211]
Methods typically encode task information with a simple dataset name as a prefix to the encoder.
We propose compositional task configurations, a set of prompts prepended to the encoder to improve cross-task generalization.
We show this not only allows the model to better learn shared knowledge across different tasks at training, but also allows us to control the model by composing new configurations.
arXiv Detail & Related papers (2022-12-17T02:20:14Z) - Document-aware Positional Encoding and Linguistic-guided Encoding for
Abstractive Multi-document Summarization [12.799359904396624]
One key challenge in multi-document summarization is to capture the relations among input documents that distinguish between single document summarization (SDS) and multi-document summarization (MDS)
We propose document-aware positional encoding and linguistic-guided encoding that can be fused with Transformer architecture for MDS.
arXiv Detail & Related papers (2022-09-13T12:22:38Z) - Learning Diverse Document Representations with Deep Query Interactions
for Dense Retrieval [79.37614949970013]
We propose a new dense retrieval model which learns diverse document representations with deep query interactions.
Our model encodes each document with a set of generated pseudo-queries to get query-informed, multi-view document representations.
arXiv Detail & Related papers (2022-08-08T16:00:55Z) - MuLD: The Multitask Long Document Benchmark [4.835289158553091]
We present a new long document benchmark consisting of only documents over 10,000 tokens.
We show that models with increased context length are better able to solve the tasks presented.
arXiv Detail & Related papers (2022-02-15T12:42:55Z) - One-shot Key Information Extraction from Document with Deep Partial
Graph Matching [60.48651298832829]
Key Information Extraction (KIE) from documents improves efficiency, productivity, and security in many industrial scenarios.
Existing supervised learning methods for the KIE task need to feed a large number of labeled samples and learn separate models for different types of documents.
We propose a deep end-to-end trainable network for one-shot KIE using partial graph matching.
arXiv Detail & Related papers (2021-09-26T07:45:53Z) - DynE: Dynamic Ensemble Decoding for Multi-Document Summarization [5.197307534263253]
We propose a simple decoding methodology which ensembles the output of multiple instances of the same model on different inputs.
We obtain state-of-the-art results on several multi-document summarization datasets.
arXiv Detail & Related papers (2020-06-15T20:40:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.