A Generative Approach to Titling and Clustering Wikipedia Sections
- URL: http://arxiv.org/abs/2005.11216v1
- Date: Fri, 22 May 2020 14:49:07 GMT
- Title: A Generative Approach to Titling and Clustering Wikipedia Sections
- Authors: Anjalie Field, Sascha Rothe, Simon Baumgartner, Cong Yu, and Abe
Ittycheriah
- Abstract summary: We evaluate transformer encoders with various decoders for information organization through a new task: generation of section headings for Wikipedia articles.
Our analysis shows that decoders containing attention mechanisms over the encoder output achieve high-scoring results by generating extractive text.
A decoder without attention better facilitates semantic encoding and can be used to generate section embeddings.
- Score: 12.154365109117025
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We evaluate the performance of transformer encoders with various decoders for
information organization through a new task: generation of section headings for
Wikipedia articles. Our analysis shows that decoders containing attention
mechanisms over the encoder output achieve high-scoring results by generating
extractive text. In contrast, a decoder without attention better facilitates
semantic encoding and can be used to generate section embeddings. We
additionally introduce a new loss function, which further encourages the
decoder to generate high-quality embeddings.
Related papers
- $ε$-VAE: Denoising as Visual Decoding [61.29255979767292]
In generative modeling, tokenization simplifies complex data into compact, structured representations, creating a more efficient, learnable space.
Current visual tokenization methods rely on a traditional autoencoder framework, where the encoder compresses data into latent representations, and the decoder reconstructs the original input.
We propose denoising as decoding, shifting from single-step reconstruction to iterative refinement. Specifically, we replace the decoder with a diffusion process that iteratively refines noise to recover the original image, guided by the latents provided by the encoder.
We evaluate our approach by assessing both reconstruction (rFID) and generation quality (
arXiv Detail & Related papers (2024-10-05T08:27:53Z) - DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers [6.405360669408265]
We propose a simple, new method to analyze encoder-decoder Transformers: DecoderLens.
Inspired by the LogitLens (for decoder-only Transformers), this method involves allowing the decoder to cross-attend representations of intermediate encoder layers.
We report results from the DecoderLens applied to models trained on question answering, logical reasoning, speech recognition and machine translation.
arXiv Detail & Related papers (2023-10-05T17:04:59Z) - Decoder-Only or Encoder-Decoder? Interpreting Language Model as a
Regularized Encoder-Decoder [75.03283861464365]
The seq2seq task aims at generating the target sequence based on the given input source sequence.
Traditionally, most of the seq2seq task is resolved by an encoder to encode the source sequence and a decoder to generate the target text.
Recently, a bunch of new approaches have emerged that apply decoder-only language models directly to the seq2seq task.
arXiv Detail & Related papers (2023-04-08T15:44:29Z) - On the Sub-Layer Functionalities of Transformer Decoder [74.83087937309266]
We study how Transformer-based decoders leverage information from the source and target languages.
Based on these insights, we demonstrate that the residual feed-forward module in each Transformer decoder layer can be dropped with minimal loss of performance.
arXiv Detail & Related papers (2020-10-06T11:50:54Z) - Beyond Single Stage Encoder-Decoder Networks: Deep Decoders for Semantic
Image Segmentation [56.44853893149365]
Single encoder-decoder methodologies for semantic segmentation are reaching their peak in terms of segmentation quality and efficiency per number of layers.
We propose a new architecture based on a decoder which uses a set of shallow networks for capturing more information content.
In order to further improve the architecture we introduce a weight function which aims to re-balance classes to increase the attention of the networks to under-represented objects.
arXiv Detail & Related papers (2020-07-19T18:44:34Z) - Rethinking and Improving Natural Language Generation with Layer-Wise
Multi-View Decoding [59.48857453699463]
In sequence-to-sequence learning, the decoder relies on the attention mechanism to efficiently extract information from the encoder.
Recent work has proposed to use representations from different encoder layers for diversified levels of information.
We propose layer-wise multi-view decoding, where for each decoder layer, together with the representations from the last encoder layer, which serve as a global view, those from other encoder layers are supplemented for a stereoscopic view of the source sequences.
arXiv Detail & Related papers (2020-05-16T20:00:39Z) - On Sparsifying Encoder Outputs in Sequence-to-Sequence Models [90.58793284654692]
We take Transformer as the testbed and introduce a layer of gates in-between the encoder and the decoder.
The gates are regularized using the expected value of the sparsity-inducing L0penalty.
We investigate the effects of this sparsification on two machine translation and two summarization tasks.
arXiv Detail & Related papers (2020-04-24T16:57:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.