Inductive Document Network Embedding with Topic-Word Attention
- URL: http://arxiv.org/abs/2001.03369v1
- Date: Fri, 10 Jan 2020 10:14:07 GMT
- Title: Inductive Document Network Embedding with Topic-Word Attention
- Authors: Robin Brochier, Adrien Guille and Julien Velcin
- Abstract summary: Document network embedding aims at learning representations for a structured text corpus when documents are linked to each other.
Recent algorithms extend network embedding approaches by incorporating the text content associated with the nodes in their formulations.
In this paper, we propose an interpretable and inductive document network embedding method.
- Score: 5.8010446129208155
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Document network embedding aims at learning representations for a structured
text corpus i.e. when documents are linked to each other. Recent algorithms
extend network embedding approaches by incorporating the text content
associated with the nodes in their formulations. In most cases, it is hard to
interpret the learned representations. Moreover, little importance is given to
the generalization to new documents that are not observed within the network.
In this paper, we propose an interpretable and inductive document network
embedding method. We introduce a novel mechanism, the Topic-Word Attention
(TWA), that generates document representations based on the interplay between
word and topic representations. We train these word and topic vectors through
our general model, Inductive Document Network Embedding (IDNE), by leveraging
the connections in the document network. Quantitative evaluations show that our
approach achieves state-of-the-art performance on various networks and we
qualitatively show that our model produces meaningful and interpretable
representations of the words, topics and documents.
Related papers
- Unified Multi-Modal Interleaved Document Representation for Information Retrieval [57.65409208879344]
We produce more comprehensive and nuanced document representations by holistically embedding documents interleaved with different modalities.
Specifically, we achieve this by leveraging the capability of recent vision-language models that enable the processing and integration of text, images, and tables into a unified format and representation.
arXiv Detail & Related papers (2024-10-03T17:49:09Z) - Contextual Document Embeddings [77.22328616983417]
We propose two complementary methods for contextualized document embeddings.
First, an alternative contrastive learning objective that explicitly incorporates the document neighbors into the intra-batch contextual loss.
Second, a new contextual architecture that explicitly encodes neighbor document information into the encoded representation.
arXiv Detail & Related papers (2024-10-03T14:33:34Z) - TeKo: Text-Rich Graph Neural Networks with External Knowledge [75.91477450060808]
We propose a novel text-rich graph neural network with external knowledge (TeKo)
We first present a flexible heterogeneous semantic network that incorporates high-quality entities.
We then introduce two types of external knowledge, that is, structured triplets and unstructured entity description.
arXiv Detail & Related papers (2022-06-15T02:33:10Z) - Unified Pretraining Framework for Document Understanding [52.224359498792836]
We present UDoc, a new unified pretraining framework for document understanding.
UDoc is designed to support most document understanding tasks, extending the Transformer to take multimodal embeddings as input.
An important feature of UDoc is that it learns a generic representation by making use of three self-supervised losses.
arXiv Detail & Related papers (2022-04-22T21:47:04Z) - NetReAct: Interactive Learning for Network Summarization [60.18513812680714]
We present NetReAct, a novel interactive network summarization algorithm which supports the visualization of networks induced by text corpora to perform sensemaking.
We show how NetReAct is successful in generating high-quality summaries and visualizations that reveal hidden patterns better than other non-trivial baselines.
arXiv Detail & Related papers (2020-12-22T03:56:26Z) - Adversarial Context Aware Network Embeddings for Textual Networks [8.680676599607123]
Existing approaches learn embeddings of text and network structure by enforcing embeddings of connected nodes to be similar.
This implies that these approaches require edge information for learning embeddings and they cannot learn embeddings of unseen nodes.
We propose an approach that achieves both modality fusion and the capability to learn embeddings of unseen nodes.
arXiv Detail & Related papers (2020-11-05T05:20:01Z) - Multilevel Text Alignment with Cross-Document Attention [59.76351805607481]
Existing alignment methods operate at a single, predefined level.
We propose a new learning approach that equips previously established hierarchical attention encoders for representing documents with a cross-document attention component.
arXiv Detail & Related papers (2020-10-03T02:52:28Z) - Hierarchical Interaction Networks with Rethinking Mechanism for
Document-level Sentiment Analysis [37.20068256769269]
Document-level Sentiment Analysis (DSA) is more challenging due to vague semantic links and complicate sentiment information.
We study how to effectively generate a discriminative representation with explicit subject patterns and sentiment contexts for DSA.
We design a Sentiment-based Rethinking mechanism (SR) by refining the HIN with sentiment label information to learn a more sentiment-aware document representation.
arXiv Detail & Related papers (2020-07-16T16:27:38Z) - Improve Document Embedding for Text Categorization Through Deep Siamese
Neural Network [2.398608007786179]
Low-dimensional representation for text is one of main challenges for efficient natural language processing tasks.
We propose the utilization of deep Siamese neural networks to map the documents with similar topics to a similar space in vector space representation.
We show that the proposed representations outperform the conventional and state-of-the-art representations in the text classification task on this dataset.
arXiv Detail & Related papers (2020-05-31T17:51:08Z) - Document Network Projection in Pretrained Word Embedding Space [7.455546102930911]
We present Regularized Linear Embedding (RLE), a novel method that projects a collection of linked documents into a pretrained word embedding space.
We leverage a matrix of pairwise similarities providing complementary information (e.g., the network proximity of two documents in a citation graph)
The document representations can help to solve many information retrieval tasks, such as recommendation, classification and clustering.
arXiv Detail & Related papers (2020-01-16T10:16:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.