Topics as Entity Clusters: Entity-based Topics from Language Models and
Graph Neural Networks
- URL: http://arxiv.org/abs/2301.02458v1
- Date: Fri, 6 Jan 2023 10:54:54 GMT
- Title: Topics as Entity Clusters: Entity-based Topics from Language Models and
Graph Neural Networks
- Authors: Manuel V. Loureiro, Steven Derby and Tri Kurniawan Wijaya
- Abstract summary: We propose a novel approach for cluster-based topic modeling that employs conceptual entities.
Entities are language-agnostic representations of real-world concepts rich in relational information.
We demonstrate that our approach consistently outperforms other state-of-the-art topic models across coherency metrics.
- Score: 0.7734726150561089
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Topic models aim to reveal the latent structure behind a corpus, typically
conducted over a bag-of-words representation of documents. In the context of
topic modeling, most vocabulary is either irrelevant for uncovering underlying
topics or contains strong relationships with relevant concepts, impacting the
interpretability of these topics. Furthermore, their limited expressiveness and
dependency on language demand considerable computation resources. Hence, we
propose a novel approach for cluster-based topic modeling that employs
conceptual entities. Entities are language-agnostic representations of
real-world concepts rich in relational information. To this end, we extract
vector representations of entities from (i) an encyclopedic corpus using a
language model; and (ii) a knowledge base using a graph neural network. We
demonstrate that our approach consistently outperforms other state-of-the-art
topic models across coherency metrics and find that the explicit knowledge
encoded in the graph-based embeddings provides more coherent topics than the
implicit knowledge encoded with the contextualized embeddings of language
models.
Related papers
- Compositional Generalization with Grounded Language Models [9.96679221246835]
Grounded language models use external sources of information, such as knowledge graphs, to meet some of the general challenges associated with pre-training.
We develop a procedure for generating natural language questions paired with knowledge graphs that targets different aspects of compositionality.
arXiv Detail & Related papers (2024-06-07T14:56:51Z) - GINopic: Topic Modeling with Graph Isomorphism Network [0.8962460460173959]
We introduce GINopic, a topic modeling framework based on graph isomorphism networks to capture the correlation between words.
We demonstrate the effectiveness of GINopic compared to existing topic models and highlight its potential for advancing topic modeling.
arXiv Detail & Related papers (2024-04-02T17:18:48Z) - Knowledge Graphs and Pre-trained Language Models enhanced Representation Learning for Conversational Recommender Systems [58.561904356651276]
We introduce the Knowledge-Enhanced Entity Representation Learning (KERL) framework to improve the semantic understanding of entities for Conversational recommender systems.
KERL uses a knowledge graph and a pre-trained language model to improve the semantic understanding of entities.
KERL achieves state-of-the-art results in both recommendation and response generation tasks.
arXiv Detail & Related papers (2023-12-18T06:41:23Z) - How Well Do Text Embedding Models Understand Syntax? [50.440590035493074]
The ability of text embedding models to generalize across a wide range of syntactic contexts remains under-explored.
Our findings reveal that existing text embedding models have not sufficiently addressed these syntactic understanding challenges.
We propose strategies to augment the generalization ability of text embedding models in diverse syntactic scenarios.
arXiv Detail & Related papers (2023-11-14T08:51:00Z) - Foundational Models Defining a New Era in Vision: A Survey and Outlook [151.49434496615427]
Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world.
The models learned to bridge the gap between such modalities coupled with large-scale training data facilitate contextual reasoning, generalization, and prompt capabilities at test time.
The output of such models can be modified through human-provided prompts without retraining, e.g., segmenting a particular object by providing a bounding box, having interactive dialogues by asking questions about an image or video scene or manipulating the robot's behavior through language instructions.
arXiv Detail & Related papers (2023-07-25T17:59:18Z) - Perceptual Grouping in Contrastive Vision-Language Models [59.1542019031645]
We show how vision-language models are able to understand where objects reside within an image and group together visually related parts of the imagery.
We propose a minimal set of modifications that results in models that uniquely learn both semantic and spatial information.
arXiv Detail & Related papers (2022-10-18T17:01:35Z) - Knowledge-Aware Bayesian Deep Topic Model [50.58975785318575]
We propose a Bayesian generative model for incorporating prior domain knowledge into hierarchical topic modeling.
Our proposed model efficiently integrates the prior knowledge and improves both hierarchical topic discovery and document representation.
arXiv Detail & Related papers (2022-09-20T09:16:05Z) - Learning Attention-based Representations from Multiple Patterns for
Relation Prediction in Knowledge Graphs [2.4028383570062606]
AEMP is a novel model for learning contextualized representations by acquiring entities' context information.
AEMP either outperforms or competes with state-of-the-art relation prediction methods.
arXiv Detail & Related papers (2022-06-07T10:53:35Z) - High-dimensional distributed semantic spaces for utterances [0.2907403645801429]
This paper describes a model for high-dimensional representation for utterance and text level data.
It is based on a mathematically principled and behaviourally plausible approach to representing linguistic information.
The paper shows how the implemented model is able to represent a broad range of linguistic features in a common integral framework of fixed dimensionality.
arXiv Detail & Related papers (2021-04-01T12:09:47Z) - Explainable and Discourse Topic-aware Neural Language Understanding [22.443597046878086]
Marrying topic models and language models exposes language understanding to a broader source of document-level context beyond sentences.
Existing approaches incorporate latent document topic proportions and ignore topical discourse in sentences of the document.
We present a novel neural composite language model that exploits both the latent and explainable topics along with topical discourse at sentence-level.
arXiv Detail & Related papers (2020-06-18T15:53:58Z) - Exploiting Structured Knowledge in Text via Graph-Guided Representation
Learning [73.0598186896953]
We present two self-supervised tasks learning over raw text with the guidance from knowledge graphs.
Building upon entity-level masked language models, our first contribution is an entity masking scheme.
In contrast to existing paradigms, our approach uses knowledge graphs implicitly, only during pre-training.
arXiv Detail & Related papers (2020-04-29T14:22:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.