Related papers: Abstractive and mixed summarization for long-single documents

Abstractive and mixed summarization for long-single documents

URL: http://arxiv.org/abs/2007.01918v1
Date: Fri, 3 Jul 2020 19:30:28 GMT
Title: Abstractive and mixed summarization for long-single documents
Authors: Roger Barrull, Jugal Kalita
Abstract summary: This paper uses scientific papers as the dataset on which different models are trained. In this work, six different models are compared, two with an RNN architecture, one with a CNN architecture, two with a Transformer architecture and one with a Transformer architecture combined with reinforcement learning.
Score: 2.792030485253753
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The lack of diversity in the datasets available for automatic summarization of documents has meant that the vast majority of neural models for automatic summarization have been trained with news articles. These datasets are relatively small, with an average size of about 600 words, and the models trained with such data sets see their performance limited to short documents. In order to surmount this problem, this paper uses scientific papers as the dataset on which different models are trained. These models have been chosen based on their performance on the CNN/Daily Mail data set, so that the highest ranked model of each architectural variant is selected. In this work, six different models are compared, two with an RNN architecture, one with a CNN architecture, two with a Transformer architecture and one with a Transformer architecture combined with reinforcement learning. The results from this work show that those models that use a hierarchical encoder to model the structure of the document has a better performance than the rest.

Related papers

Adaptable Embeddings Network (AEN) [49.1574468325115]
We introduce Adaptable Embeddings Networks (AEN), a novel dual-encoder architecture using Kernel Density Estimation (KDE) AEN allows for runtime adaptation of classification criteria without retraining and is non-autoregressive. The architecture's ability to preprocess and cache condition embeddings makes it ideal for edge computing applications and real-time monitoring systems.
arXiv Detail & Related papers (2024-11-21T02:15:52Z)
Transfer Learning with Point Transformers [3.678615604632945]
Point Transformers are state-of-the-art models for classification, segmentation, and detection on Point Cloud data. We explore two things: classification performance of these attention based networks on ModelNet10 dataset and then, we use the trained model to classify 3D MNIST dataset after finetuning.
arXiv Detail & Related papers (2024-04-01T01:23:58Z)
Reformulating Vision-Language Foundation Models and Datasets Towards Universal Multimodal Assistants [65.47222691674074]
Muffin framework employs pre-trained vision-language models to act as providers of visual signals. UniMM-Chat dataset explores the complementarities of datasets to generate 1.1M high-quality and diverse multimodal instructions.
arXiv Detail & Related papers (2023-10-01T12:35:18Z)
Dissecting Multimodality in VideoQA Transformer Models by Impairing Modality Fusion [54.33764537135906]
VideoQA Transformer models demonstrate competitive performance on standard benchmarks. Do these models capture the rich multimodal structures and dynamics from video and text jointly? Are they achieving high scores by exploiting biases and spurious features?
arXiv Detail & Related papers (2023-06-15T06:45:46Z)
Comparing a composite model versus chained models to locate a nearest visual object [0.6882042556551609]
We investigate the selection of an appropriate artificial neural network model for extracting information from geographic images and text. Our results showed that these two architectures achieved the same level performance with a root mean square error (RMSE) of 0.055 and 0.056. When the task can be decomposed into sub-tasks, the chain architecture exhibits a twelve-fold increase in training speed compared to the composite model.
arXiv Detail & Related papers (2023-06-02T13:58:59Z)
Parameter-Efficient Abstractive Question Answering over Tables or Text [60.86457030988444]
A long-term ambition of information seeking QA systems is to reason over multi-modal contexts and generate natural answers to user queries. Memory intensive pre-trained language models are adapted to downstream tasks such as QA by fine-tuning the model on QA data in a specific modality like unstructured text or structured tables. To avoid training such memory-hungry models while utilizing a uniform architecture for each modality, parameter-efficient adapters add and train small task-specific bottle-neck layers between transformer layers.
arXiv Detail & Related papers (2022-04-07T10:56:29Z)
Long Document Summarization with Top-down and Bottom-up Inference [113.29319668246407]
We propose a principled inference framework to improve summarization models on two aspects. Our framework assumes a hierarchical latent structure of a document where the top-level captures the long range dependency. We demonstrate the effectiveness of the proposed framework on a diverse set of summarization datasets.
arXiv Detail & Related papers (2022-03-15T01:24:51Z)
PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization [16.830963601598242]
We propose PRIMER, a pre-trained model for multi-document representation with focus on summarization. Specifically, we adopt the Longformer architecture with proper input transformation and global attention to fit for multi-document inputs. Our model, PRIMER, outperforms current state-of-the-art models on most of these settings with large margins.
arXiv Detail & Related papers (2021-10-16T07:22:24Z)
Revisiting Deep Learning Models for Tabular Data [40.67427600770095]
It is unclear for both researchers and practitioners what models perform best. The first one is a ResNet-like architecture which turns out to be a strong baseline that is often missing in prior works. The second model is our simple adaptation of the Transformer architecture for tabular data, which outperforms other solutions on most tasks.
arXiv Detail & Related papers (2021-06-22T17:58:10Z)
Data Augmentation for Abstractive Query-Focused Multi-Document Summarization [129.96147867496205]
We present two QMDS training datasets, which we construct using two data augmentation methods. These two datasets have complementary properties, i.e., QMDSCNN has real summaries but queries are simulated, while QMDSIR has real queries but simulated summaries. We build end-to-end neural network models on the combined datasets that yield new state-of-the-art transfer results on DUC datasets.
arXiv Detail & Related papers (2021-03-02T16:57:01Z)
NASE: Learning Knowledge Graph Embedding for Link Prediction via Neural Architecture Search [9.634626241415916]
Link prediction is the task of predicting missing connections between entities in the knowledge graph (KG) Previous work has tried to use Automated Machine Learning (AutoML) to search for the best model for a given dataset. We propose a novel Neural Architecture Search (NAS) framework for the link prediction task.
arXiv Detail & Related papers (2020-08-18T03:34:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.