CQE: A Comprehensive Quantity Extractor
- URL: http://arxiv.org/abs/2305.08853v1
- Date: Mon, 15 May 2023 17:59:41 GMT
- Title: CQE: A Comprehensive Quantity Extractor
- Authors: Satya Almasian, Vivian Kazakova, Philip G\"oldner and Michael Gertz
- Abstract summary: We present a comprehensive quantity extraction framework from text data.
It efficiently detects combinations of values and units, the behavior of a quantity, and the concept a quantity is associated with.
Our framework makes use of dependency parsing and a dictionary of units, and it provides for a proper normalization and standardization of detected quantities.
- Score: 2.2079886535603084
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Quantities are essential in documents to describe factual information. They
are ubiquitous in application domains such as finance, business, medicine, and
science in general. Compared to other information extraction approaches,
interestingly only a few works exist that describe methods for a proper
extraction and representation of quantities in text. In this paper, we present
such a comprehensive quantity extraction framework from text data. It
efficiently detects combinations of values and units, the behavior of a
quantity (e.g., rising or falling), and the concept a quantity is associated
with. Our framework makes use of dependency parsing and a dictionary of units,
and it provides for a proper normalization and standardization of detected
quantities. Using a novel dataset for evaluation, we show that our open source
framework outperforms other systems and -- to the best of our knowledge -- is
the first to detect concepts associated with identified quantities. The code
and data underlying our framework are available at
https://github.com/vivkaz/CQE.
Related papers
- Numbers Matter! Bringing Quantity-awareness to Retrieval Systems [5.7486903101353715]
We introduce two quantity-aware ranking techniques designed to rank both the quantity and textual content either jointly or independently.
These techniques incorporate quantity information in available retrieval systems and can address queries with numerical conditions equal, greater than, and less than.
To evaluate the effectiveness of our proposed models, we introduce two novel quantity-aware benchmark datasets in the domains of finance and medicine.
arXiv Detail & Related papers (2024-07-14T17:56:11Z) - Text-To-KG Alignment: Comparing Current Methods on Classification Tasks [2.191505742658975]
knowledge graphs (KG) provide dense and structured representations of factual information.
Recent work has focused on creating pipeline models that retrieve information from KGs as additional context.
It is not known how current methods compare to a scenario where the aligned subgraph is completely relevant to the query.
arXiv Detail & Related papers (2023-06-05T13:45:45Z) - Modeling Entities as Semantic Points for Visual Information Extraction
in the Wild [55.91783742370978]
We propose an alternative approach to precisely and robustly extract key information from document images.
We explicitly model entities as semantic points, i.e., center points of entities are enriched with semantic information describing the attributes and relationships of different entities.
The proposed method can achieve significantly enhanced performance on entity labeling and linking, compared with previous state-of-the-art models.
arXiv Detail & Related papers (2023-03-23T08:21:16Z) - Enriching Relation Extraction with OpenIE [70.52564277675056]
Relation extraction (RE) is a sub-discipline of information extraction (IE)
In this work, we explore how recent approaches for open information extraction (OpenIE) may help to improve the task of RE.
Our experiments over two annotated corpora, KnowledgeNet and FewRel, demonstrate the improved accuracy of our enriched models.
arXiv Detail & Related papers (2022-12-19T11:26:23Z) - How "Multi" is Multi-Document Summarization? [15.574673241564932]
It is expected that both reference summaries in MDS datasets, as well as system summaries, would indeed be based on dispersed information.
We propose an automated measure for evaluating the degree to which a summary is disperse''
Our results show that certain MDS datasets barely require combining information from multiple documents, where a single document often covers the full summary content.
arXiv Detail & Related papers (2022-10-23T10:20:09Z) - Open Domain Question Answering over Virtual Documents: A Unified
Approach for Data and Text [62.489652395307914]
We use the data-to-text method as a means for encoding structured knowledge for knowledge-intensive applications, i.e. open-domain question answering (QA)
Specifically, we propose a verbalizer-retriever-reader framework for open-domain QA over data and text where verbalized tables from Wikipedia and triples from Wikidata are used as augmented knowledge sources.
We show that our Unified Data and Text QA, UDT-QA, can effectively benefit from the expanded knowledge index, leading to large gains over text-only baselines.
arXiv Detail & Related papers (2021-10-16T00:11:21Z) - A system for information extraction from scientific texts in Russian [0.0]
The system performs several tasks in an end-to-end manner: term recognition, extraction of relations between terms, and term linking with entities from the knowledge base.
The advantage of the implemented methods is that the system does not require a large amount of labeled data, which saves time and effort for data labeling.
The source code is publicly available and can be used for different research purposes.
arXiv Detail & Related papers (2021-09-14T14:08:37Z) - QuaPy: A Python-Based Framework for Quantification [76.22817970624875]
QuaPy is an open-source framework for performing quantification (a.k.a. supervised prevalence estimation)
It is written in Python and can be installed via pip.
arXiv Detail & Related papers (2021-06-18T13:57:11Z) - KILT: a Benchmark for Knowledge Intensive Language Tasks [102.33046195554886]
We present a benchmark for knowledge-intensive language tasks (KILT)
All tasks in KILT are grounded in the same snapshot of Wikipedia.
We find that a shared dense vector index coupled with a seq2seq model is a strong baseline.
arXiv Detail & Related papers (2020-09-04T15:32:19Z) - Extending Text Informativeness Measures to Passage Interestingness
Evaluation (Language Model vs. Word Embedding) [1.2998637003026272]
This paper defines the concept of Interestingness as a generalization of Informativeness.
We then study the ability of state of the art Informativeness measures to cope with this generalization.
We prove that the CLEF-INEX Tweet Contextualization 2012 Logarithm Similarity measure provides best results.
arXiv Detail & Related papers (2020-04-14T18:22:48Z) - Inferential Text Generation with Multiple Knowledge Sources and
Meta-Learning [117.23425857240679]
We study the problem of generating inferential texts of events for a variety of commonsense like textitif-else relations.
Existing approaches typically use limited evidence from training examples and learn for each relation individually.
In this work, we use multiple knowledge sources as fuels for the model.
arXiv Detail & Related papers (2020-04-07T01:49:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.