HydraSum -- Disentangling Stylistic Features in Text Summarization using
Multi-Decoder Models
- URL: http://arxiv.org/abs/2110.04400v1
- Date: Fri, 8 Oct 2021 22:49:49 GMT
- Title: HydraSum -- Disentangling Stylistic Features in Text Summarization using
Multi-Decoder Models
- Authors: Tanya Goyal, Nazneen Fatema Rajani, Wenhao Liu, Wojciech
Kry\'sci\'nski
- Abstract summary: We introduce HydraSum, a new summarization architecture that extends the single decoder framework of current models.
Our proposed model encourages each expert, i.e. decoder, to learn and generate stylistically-distinct summaries.
A guided version of the training process can explicitly govern which summary style is partitioned between decoders.
- Score: 12.070474521259776
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing abstractive summarization models lack explicit control mechanisms
that would allow users to influence the stylistic features of the model
outputs. This results in generating generic summaries that do not cater to the
users needs or preferences. To address this issue we introduce HydraSum, a new
summarization architecture that extends the single decoder framework of current
models, e.g. BART, to a mixture-of-experts version consisting of multiple
decoders. Our proposed model encourages each expert, i.e. decoder, to learn and
generate stylistically-distinct summaries along dimensions such as
abstractiveness, length, specificity, and others. At each time step, HydraSum
employs a gating mechanism that decides the contribution of each individual
decoder to the next token's output probability distribution. Through
experiments on three summarization datasets (CNN, Newsroom, XSum), we
demonstrate that this gating mechanism automatically learns to assign
contrasting summary styles to different HydraSum decoders under the standard
training objective without the need for additional supervision. We further show
that a guided version of the training process can explicitly govern which
summary style is partitioned between decoders, e.g. high abstractiveness vs.
low abstractiveness or high specificity vs. low specificity, and also increase
the stylistic-difference between individual decoders. Finally, our experiments
demonstrate that our decoder framework is highly flexible: during inference, we
can sample from individual decoders or mixtures of different subsets of the
decoders to yield a diverse set of summaries and enforce single- and
multi-style control over summary generation.
Related papers
- Universal Medical Image Representation Learning with Compositional Decoders [36.36271760800241]
We develop a decomposed-composed universal medical imaging paradigm (UniMed) that supports tasks at all levels.
We propose a decomposed decoder that can predict two types of outputs -- pixel and semantic, based on a defined input queue.
We introduce a composed decoder that unifies the input and output spaces and standardizes task annotations across different levels into a discrete token format.
arXiv Detail & Related papers (2024-09-30T02:39:42Z) - DeepInteraction++: Multi-Modality Interaction for Autonomous Driving [80.8837864849534]
We introduce a novel modality interaction strategy that allows individual per-modality representations to be learned and maintained throughout.
DeepInteraction++ is a multi-modal interaction framework characterized by a multi-modal representational interaction encoder and a multi-modal predictive interaction decoder.
Experiments demonstrate the superior performance of the proposed framework on both 3D object detection and end-to-end autonomous driving tasks.
arXiv Detail & Related papers (2024-08-09T14:04:21Z) - Unified Generation, Reconstruction, and Representation: Generalized Diffusion with Adaptive Latent Encoding-Decoding [90.77521413857448]
Deep generative models are anchored in three core capabilities -- generating new instances, reconstructing inputs, and learning compact representations.
We introduce Generalized generative adversarial-Decoding Diffusion Probabilistic Models (EDDPMs)
EDDPMs generalize the Gaussian noising-denoising in standard diffusion by introducing parameterized encoding-decoding.
Experiments on text, proteins, and images demonstrate the flexibility to handle diverse data and tasks.
arXiv Detail & Related papers (2024-02-29T10:08:57Z) - Triple-View Knowledge Distillation for Semi-Supervised Semantic
Segmentation [54.23510028456082]
We propose a Triple-view Knowledge Distillation framework, termed TriKD, for semi-supervised semantic segmentation.
The framework includes the triple-view encoder and the dual-frequency decoder.
arXiv Detail & Related papers (2023-09-22T01:02:21Z) - Sequence-to-Sequence Pre-training with Unified Modality Masking for
Visual Document Understanding [3.185382039518151]
GenDoc is a sequence-to-sequence document understanding model pre-trained with unified masking across three modalities.
The proposed model utilizes an encoder-decoder architecture, which allows for increased adaptability to a wide range of downstream tasks.
arXiv Detail & Related papers (2023-05-16T15:25:19Z) - String-based Molecule Generation via Multi-decoder VAE [56.465033997245776]
We investigate the problem of string-based molecular generation via variational autoencoders (VAEs)
We propose a simple, yet effective idea to improve the performance of VAE for the task.
In our experiments, the proposed VAE model particularly performs well for generating a sample from out-of-domain distribution.
arXiv Detail & Related papers (2022-08-23T03:56:30Z) - CCVS: Context-aware Controllable Video Synthesis [95.22008742695772]
presentation introduces a self-supervised learning approach to the synthesis of new video clips from old ones.
It conditions the synthesis process on contextual information for temporal continuity and ancillary information for fine control.
arXiv Detail & Related papers (2021-07-16T17:57:44Z) - Rethinking and Improving Natural Language Generation with Layer-Wise
Multi-View Decoding [59.48857453699463]
In sequence-to-sequence learning, the decoder relies on the attention mechanism to efficiently extract information from the encoder.
Recent work has proposed to use representations from different encoder layers for diversified levels of information.
We propose layer-wise multi-view decoding, where for each decoder layer, together with the representations from the last encoder layer, which serve as a global view, those from other encoder layers are supplemented for a stereoscopic view of the source sequences.
arXiv Detail & Related papers (2020-05-16T20:00:39Z) - Knowledge Graph-Augmented Abstractive Summarization with Semantic-Driven
Cloze Reward [42.925345819778656]
We present ASGARD, a novel framework for Abstractive Summarization with Graph-Augmentation and semantic-driven RewarD.
We propose the use of dual encoders---a sequential document encoder and a graph-structured encoder---to maintain the global context and local characteristics of entities.
Results show that our models produce significantly higher ROUGE scores than a variant without knowledge graph as input on both New York Times and CNN/Daily Mail datasets.
arXiv Detail & Related papers (2020-05-03T18:23:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.