Related papers: Concept Extraction Using Pointer-Generator Networks

Concept Extraction Using Pointer-Generator Networks

URL: http://arxiv.org/abs/2008.11295v1
Date: Tue, 25 Aug 2020 22:28:14 GMT
Title: Concept Extraction Using Pointer-Generator Networks
Authors: Alexander Shvets and Leo Wanner
Abstract summary: We propose a generic open-domain OOV-oriented extractive model that is based on distant supervision of a pointer-generator network. The model has been trained on a large annotated corpus compiled specifically for this task from 250K Wikipedia pages.
Score: 86.75999352383535
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Concept extraction is crucial for a number of downstream applications. However, surprisingly enough, straightforward single token/nominal chunk-concept alignment or dictionary lookup techniques such as DBpedia Spotlight still prevail. We propose a generic open-domain OOV-oriented extractive model that is based on distant supervision of a pointer-generator network leveraging bidirectional LSTMs and a copy mechanism. The model has been trained on a large annotated corpus compiled specifically for this task from 250K Wikipedia pages, and tested on regular pages, where the pointers to other pages are considered as ground truth concepts. The outcome of the experiments shows that our model significantly outperforms standard techniques and, when used on top of DBpedia Spotlight, further improves its performance. The experiments furthermore show that the model can be readily ported to other datasets on which it equally achieves a state-of-the-art performance.

Related papers

IncepFormerNet: A multi-scale multi-head attention network for SSVEP classification [12.935583315234553]
This study proposes a new model called IncepFormerNet, which is a hybrid of the Inception and Transformer architectures. IncepFormerNet adeptly extracts multi-scale temporal information from time series data using parallel convolution kernels of varying sizes. It takes advantage of filter bank techniques to extract features based on the spectral characteristics of SSVEP data.
arXiv Detail & Related papers (2025-02-04T13:04:03Z)
Efficient Document Ranking with Learnable Late Interactions [73.41976017860006]
Cross-Encoder (CE) and Dual-Encoder (DE) models are two fundamental approaches for query-document relevance in information retrieval. To predict relevance, CE models use joint query-document embeddings, while DE models maintain factorized query and document embeddings. Recently, late-interaction models have been proposed to realize more favorable latency-quality tradeoffs, by using a DE structure followed by a lightweight scorer.
arXiv Detail & Related papers (2024-06-25T22:50:48Z)
With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning [47.96387857237473]
We devise a network which can perform attention over activations obtained while processing other training samples. Our memory models the distribution of past keys and values through the definition of prototype vectors. We demonstrate that our proposal can increase the performance of an encoder-decoder Transformer by 3.7 CIDEr points both when training in cross-entropy only and when fine-tuning with self-critical sequence training.
arXiv Detail & Related papers (2023-08-23T18:53:00Z)
Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities [54.26896306906937]
We present OVEN-Wiki, where a model need to link an image onto a Wikipedia entity with respect to a text query. We show that a PaLI-based auto-regressive visual recognition model performs surprisingly well, even on Wikipedia entities that have never been seen during fine-tuning. While PaLI-based models obtain higher overall performance, CLIP-based models are better at recognizing tail entities.
arXiv Detail & Related papers (2023-02-22T05:31:26Z)
ALSO: Automotive Lidar Self-supervision by Occupancy estimation [70.70557577874155]
We propose a new self-supervised method for pre-training the backbone of deep perception models operating on point clouds. The core idea is to train the model on a pretext task which is the reconstruction of the surface on which the 3D points are sampled. The intuition is that if the network is able to reconstruct the scene surface, given only sparse input points, then it probably also captures some fragments of semantic information.
arXiv Detail & Related papers (2022-12-12T13:10:19Z)
Long Document Summarization with Top-down and Bottom-up Inference [113.29319668246407]
We propose a principled inference framework to improve summarization models on two aspects. Our framework assumes a hierarchical latent structure of a document where the top-level captures the long range dependency. We demonstrate the effectiveness of the proposed framework on a diverse set of summarization datasets.
arXiv Detail & Related papers (2022-03-15T01:24:51Z)
When Liebig's Barrel Meets Facial Landmark Detection: A Practical Model [87.25037167380522]
We propose a model that is accurate, robust, efficient, generalizable, and end-to-end trainable. In order to achieve a better accuracy, we propose two lightweight modules. DQInit dynamically initializes the queries of decoder from the inputs, enabling the model to achieve as good accuracy as the ones with multiple decoder layers. QAMem is designed to enhance the discriminative ability of queries on low-resolution feature maps by assigning separate memory values to each query rather than a shared one.
arXiv Detail & Related papers (2021-05-27T13:51:42Z)
Embarrassingly Simple Unsupervised Aspect Extraction [4.695687634290403]
We present a simple but effective method for aspect identification in sentiment analysis. Our method only requires word embeddings and a POS tagger. We introduce Contrastive Attention (CAt), a novel single-head attention mechanism.
arXiv Detail & Related papers (2020-04-28T15:09:51Z)
A more abstractive summarization model [0.0]
We investigate why pointer-generator network is unable to generate novel words. We then address that by adding an Out-of-vocabulary penalty. We also report rouge scores of our model since most summarization models are evaluated with R-1, R-2, R-L scores.
arXiv Detail & Related papers (2020-02-25T15:22:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.