DynE: Dynamic Ensemble Decoding for Multi-Document Summarization
- URL: http://arxiv.org/abs/2006.08748v1
- Date: Mon, 15 Jun 2020 20:40:06 GMT
- Title: DynE: Dynamic Ensemble Decoding for Multi-Document Summarization
- Authors: Chris Hokamp, Demian Gholipour Ghalandari, Nghia The Pham, John Glover
- Abstract summary: We propose a simple decoding methodology which ensembles the output of multiple instances of the same model on different inputs.
We obtain state-of-the-art results on several multi-document summarization datasets.
- Score: 5.197307534263253
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sequence-to-sequence (s2s) models are the basis for extensive work in natural
language processing. However, some applications, such as multi-document
summarization, multi-modal machine translation, and the automatic post-editing
of machine translation, require mapping a set of multiple distinct inputs into
a single output sequence. Recent work has introduced bespoke architectures for
these multi-input settings, and developed models which can handle increasingly
longer inputs; however, the performance of special model architectures is
limited by the available in-domain training data. In this work we propose a
simple decoding methodology which ensembles the output of multiple instances of
the same model on different inputs. Our proposed approach allows models trained
for vanilla s2s tasks to be directly used in multi-input settings. This works
particularly well when each of the inputs has significant overlap with the
others, as when compressing a cluster of news articles about the same event
into a single coherent summary, and we obtain state-of-the-art results on
several multi-document summarization datasets.
Related papers
- VIMI: Grounding Video Generation through Multi-modal Instruction [89.90065445082442]
Existing text-to-video diffusion models rely solely on text-only encoders for their pretraining.
We construct a large-scale multimodal prompt dataset by employing retrieval methods to pair in-context examples with the given text prompts.
We finetune the model from the first stage on three video generation tasks, incorporating multi-modal instructions.
arXiv Detail & Related papers (2024-07-08T18:12:49Z) - AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling [115.89786751297348]
We introduce AnyGPT, an any-to-any multimodal language model that utilizes discrete representations for the unified processing of various modalities.
We build a multimodal text-centric dataset for multimodal alignment pre-training.
We show that AnyGPT is capable of facilitating any-to-any multimodal conversation while achieving performance comparable to specialized models across all modalities.
arXiv Detail & Related papers (2024-02-19T15:33:10Z) - Task-Based MoE for Multitask Multilingual Machine Translation [58.20896429151824]
Mixture-of-experts (MoE) architecture has been proven a powerful method for diverse tasks in training deep models in many applications.
In this work, we design a novel method that incorporates task information into MoE models at different granular levels with shared dynamic task-based adapters.
arXiv Detail & Related papers (2023-08-30T05:41:29Z) - Diffusion Model is an Effective Planner and Data Synthesizer for
Multi-Task Reinforcement Learning [101.66860222415512]
Multi-Task Diffusion Model (textscMTDiff) is a diffusion-based method that incorporates Transformer backbones and prompt learning for generative planning and data synthesis.
For generative planning, we find textscMTDiff outperforms state-of-the-art algorithms across 50 tasks on Meta-World and 8 maps on Maze2D.
arXiv Detail & Related papers (2023-05-29T05:20:38Z) - Align and Attend: Multimodal Summarization with Dual Contrastive Losses [57.83012574678091]
The goal of multimodal summarization is to extract the most important information from different modalities to form output summaries.
Existing methods fail to leverage the temporal correspondence between different modalities and ignore the intrinsic correlation between different samples.
We introduce Align and Attend Multimodal Summarization (A2Summ), a unified multimodal transformer-based model which can effectively align and attend the multimodal input.
arXiv Detail & Related papers (2023-03-13T17:01:42Z) - OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist
Models [72.8156832931841]
Generalist models are capable of performing diverse multi-modal tasks in a task-agnostic way within a single model.
We release a generalist model learning system, OFASys, built on top of a declarative task interface named multi-modal instruction.
arXiv Detail & Related papers (2022-12-08T17:07:09Z) - VUT: Versatile UI Transformer for Multi-Modal Multi-Task User Interface
Modeling [11.569380762858815]
VUT is a Versatile UI Transformer that takes multimodal input and simultaneously accomplishes 5 distinct tasks with the same model.
Our model consists of a multimodal Transformer encoder that jointly encodes UI images and structures, and performs UI object detection when the UI structures are absent in the input.
arXiv Detail & Related papers (2021-12-10T17:37:26Z) - PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document
Summarization [16.830963601598242]
We propose PRIMER, a pre-trained model for multi-document representation with focus on summarization.
Specifically, we adopt the Longformer architecture with proper input transformation and global attention to fit for multi-document inputs.
Our model, PRIMER, outperforms current state-of-the-art models on most of these settings with large margins.
arXiv Detail & Related papers (2021-10-16T07:22:24Z) - Unsupervised Multimodal Language Representations using Convolutional
Autoencoders [5.464072883537924]
We propose extracting unsupervised Multimodal Language representations that are universal and can be applied to different tasks.
We map the word-level aligned multimodal sequences to 2-D matrices and then use Convolutional Autoencoders to learn embeddings by combining multiple datasets.
It is also shown that our method is extremely lightweight and can be easily generalized to other tasks and unseen data with small performance drop and almost the same number of parameters.
arXiv Detail & Related papers (2021-10-06T18:28:07Z) - Representing Unordered Data Using Complex-Weighted Multiset Automata [23.68657135308002]
We show how the multiset representations of certain existing neural architectures can be viewed as special cases of ours.
Namely, we provide a new theoretical and intuitive justification for the Transformer model's representation of positions using sinusoidal functions.
We extend the DeepSets model to use complex numbers, enabling it to outperform the existing model on an extension of one of their tasks.
arXiv Detail & Related papers (2020-01-02T20:04:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.