Triplètoile: Extraction of Knowledge from Microblogging Text
- URL: http://arxiv.org/abs/2408.14908v1
- Date: Tue, 27 Aug 2024 09:35:13 GMT
- Title: Triplètoile: Extraction of Knowledge from Microblogging Text
- Authors: Vanni Zavarella, Sergio Consoli, Diego Reforgiato Recupero, Gianni Fenu, Simone Angioni, Davide Buscaldi, Danilo Dessì, Francesco Osborne,
- Abstract summary: We propose an enhanced information extraction pipeline tailored to the extraction of a knowledge graph comprising open-domain entities from micro-blogging posts on social media platforms.
Our pipeline leverages dependency parsing and classifies entity relations in an unsupervised manner through hierarchical clustering over word embeddings.
We provide a use case on extracting semantic triples from a corpus of 100 thousand tweets about digital transformation and publicly release the generated knowledge graph.
- Score: 7.848242781280095
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Numerous methods and pipelines have recently emerged for the automatic extraction of knowledge graphs from documents such as scientific publications and patents. However, adapting these methods to incorporate alternative text sources like micro-blogging posts and news has proven challenging as they struggle to model open-domain entities and relations, typically found in these sources. In this paper, we propose an enhanced information extraction pipeline tailored to the extraction of a knowledge graph comprising open-domain entities from micro-blogging posts on social media platforms. Our pipeline leverages dependency parsing and classifies entity relations in an unsupervised manner through hierarchical clustering over word embeddings. We provide a use case on extracting semantic triples from a corpus of 100 thousand tweets about digital transformation and publicly release the generated knowledge graph. On the same dataset, we conduct two experimental evaluations, showing that the system produces triples with precision over 95% and outperforms similar pipelines of around 5% in terms of precision, while generating a comparatively higher number of triples.
Related papers
- Fact Checking Beyond Training Set [64.88575826304024]
We show that the retriever-reader suffers from performance deterioration when it is trained on labeled data from one domain and used in another domain.
We propose an adversarial algorithm to make the retriever component robust against distribution shift.
We then construct eight fact checking scenarios from these datasets, and compare our model to a set of strong baseline models.
arXiv Detail & Related papers (2024-03-27T15:15:14Z) - Consistency Guided Knowledge Retrieval and Denoising in LLMs for
Zero-shot Document-level Relation Triplet Extraction [43.50683283748675]
Document-level Relation Triplet Extraction (DocRTE) is a fundamental task in information systems that aims to simultaneously extract entities with semantic relations from a document.
Existing methods heavily rely on a substantial amount of fully labeled data.
Recent advanced Large Language Models (LLMs), such as ChatGPT and LLaMA, exhibit impressive long-text generation capabilities.
arXiv Detail & Related papers (2024-01-24T17:04:28Z) - Article Classification with Graph Neural Networks and Multigraphs [0.12499537119440243]
We propose a method to enhance the performance of article classification by enriching simple Graph Neural Network (GNN) pipelines with multi-graph representations.
fully supervised transductive node classification experiments are conducted on the Open Graph Benchmark OGBN-arXiv dataset and the PubMed diabetes dataset.
Results demonstrate that multi-graphs consistently improve the performance of a variety of GNN models compared to the default graphs.
arXiv Detail & Related papers (2023-09-20T14:18:04Z) - ProVe: A Pipeline for Automated Provenance Verification of Knowledge
Graphs against Textual Sources [5.161088104035106]
ProVe is a pipelined approach that automatically verifies whether a Knowledge Graph triple is supported by text extracted from its documented provenance.
ProVe is evaluated on a Wikidata dataset, achieving promising results overall and excellent performance on the binary classification task of detecting support from provenance.
arXiv Detail & Related papers (2022-10-26T16:47:36Z) - Walk-and-Relate: A Random-Walk-based Algorithm for Representation
Learning on Sparse Knowledge Graphs [5.444459446244819]
We propose an efficient method to augment the number of triplets to address the problem of data sparsity.
We also provide approaches to accurately and efficiently filter out informative metapaths from the possible set of metapaths.
The proposed approaches are model-agnostic, and the augmented training dataset can be used with any KG embedding approach out of the box.
arXiv Detail & Related papers (2022-09-19T05:35:23Z) - Repurposing Knowledge Graph Embeddings for Triple Representation via
Weak Supervision [77.34726150561087]
Current methods learn triple embeddings from scratch without utilizing entity and predicate embeddings from pre-trained models.
We develop a method for automatically sampling triples from a knowledge graph and estimating their pairwise similarities from pre-trained embedding models.
These pairwise similarity scores are then fed to a Siamese-like neural architecture to fine-tune triple representations.
arXiv Detail & Related papers (2022-08-22T14:07:08Z) - Modeling Multi-Granularity Hierarchical Features for Relation Extraction [26.852869800344813]
We propose a novel method to extract multi-granularity features based solely on the original input sentences.
We show that effective structured features can be attained even without external knowledge.
arXiv Detail & Related papers (2022-04-09T09:44:05Z) - Combining Feature and Instance Attribution to Detect Artifacts [62.63504976810927]
We propose methods to facilitate identification of training data artifacts.
We show that this proposed training-feature attribution approach can be used to uncover artifacts in training data.
We execute a small user study to evaluate whether these methods are useful to NLP researchers in practice.
arXiv Detail & Related papers (2021-07-01T09:26:13Z) - Rethinking Text Line Recognition Models [57.47147190119394]
We consider two decoder families (Connectionist Temporal Classification and Transformer) and three encoder modules (Bidirectional LSTMs, Self-Attention, and GRCLs)
We compare their accuracy and performance on widely used public datasets of scene and handwritten text.
Unlike the more common Transformer-based models, this architecture can handle inputs of arbitrary length.
arXiv Detail & Related papers (2021-04-15T21:43:13Z) - ENT-DESC: Entity Description Generation by Exploring Knowledge Graph [53.03778194567752]
In practice, the input knowledge could be more than enough, since the output description may only cover the most significant knowledge.
We introduce a large-scale and challenging dataset to facilitate the study of such a practical scenario in KG-to-text.
We propose a multi-graph structure that is able to represent the original graph information more comprehensively.
arXiv Detail & Related papers (2020-04-30T14:16:19Z) - Heterogeneous Graph Neural Networks for Extractive Document
Summarization [101.17980994606836]
Cross-sentence relations are a crucial step in extractive document summarization.
We present a graph-based neural network for extractive summarization (HeterSumGraph)
We introduce different types of nodes into graph-based neural networks for extractive document summarization.
arXiv Detail & Related papers (2020-04-26T14:38:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.