Nearest Neighbor Search over Vectorized Lexico-Syntactic Patterns for
Relation Extraction from Financial Documents
- URL: http://arxiv.org/abs/2310.17714v1
- Date: Thu, 26 Oct 2023 18:19:56 GMT
- Title: Nearest Neighbor Search over Vectorized Lexico-Syntactic Patterns for
Relation Extraction from Financial Documents
- Authors: Pawan Kumar Rajpoot, Ankur Parikh
- Abstract summary: We introduce a simple approach that consults training relations at test time through a nearest-neighbor search over dense vectors of lexico-syntactic patterns.
We evaluate our approach on REFinD and show that our method achieves state-of-the-art performance.
- Score: 1.068607542484439
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Relation extraction (RE) has achieved remarkable progress with the help of
pre-trained language models. However, existing RE models are usually incapable
of handling two situations: implicit expressions and long-tail relation
classes, caused by language complexity and data sparsity. Further, these
approaches and models are largely inaccessible to users who don't have direct
access to large language models (LLMs) and/or infrastructure for supervised
training or fine-tuning. Rule-based systems also struggle with implicit
expressions. Apart from this, Real world financial documents such as various
10-X reports (including 10-K, 10-Q, etc.) of publicly traded companies pose
another challenge to rule-based systems in terms of longer and complex
sentences. In this paper, we introduce a simple approach that consults training
relations at test time through a nearest-neighbor search over dense vectors of
lexico-syntactic patterns and provides a simple yet effective means to tackle
the above issues. We evaluate our approach on REFinD and show that our method
achieves state-of-the-art performance. We further show that it can provide a
good start for human in the loop setup when a small number of annotations are
available and it is also beneficial when domain experts can provide high
quality patterns.
Related papers
- Graph-DPEP: Decomposed Plug and Ensemble Play for Few-Shot Document Relation Extraction with Graph-of-Thoughts Reasoning [34.85741925091139]
Graph-DPEP framework is grounded in the reasoning behind triplet explanation thoughts presented in natural language.
We develop "ensemble-play", reapplying generation on the entire type list by leveraging the reasoning thoughts embedded in a sub-graph.
arXiv Detail & Related papers (2024-11-05T07:12:36Z) - Unsupervised Approach to Evaluate Sentence-Level Fluency: Do We Really
Need Reference? [3.2528685897001455]
This paper adapts an existing unsupervised technique for measuring text fluency without the need for any reference.
Our approach leverages various word embeddings and trains language models using Recurrent Neural Network (RNN) architectures.
To assess the performance of the models, we conduct a comparative analysis across 10 Indic languages.
arXiv Detail & Related papers (2023-12-03T20:09:23Z) - Understanding prompt engineering may not require rethinking
generalization [56.38207873589642]
We show that the discrete nature of prompts, combined with a PAC-Bayes prior given by a language model, results in generalization bounds that are remarkably tight by the standards of the literature.
This work provides a possible justification for the widespread practice of prompt engineering.
arXiv Detail & Related papers (2023-10-06T00:52:48Z) - RAVEN: In-Context Learning with Retrieval-Augmented Encoder-Decoder Language Models [57.12888828853409]
RAVEN is a model that combines retrieval-augmented masked language modeling and prefix language modeling.
Fusion-in-Context Learning enables the model to leverage more in-context examples without requiring additional training.
Our work underscores the potential of retrieval-augmented encoder-decoder language models for in-context learning.
arXiv Detail & Related papers (2023-08-15T17:59:18Z) - Are Large Language Models Robust Coreference Resolvers? [17.60248310475889]
We show that prompting for coreference can outperform current unsupervised coreference systems.
Further investigations reveal that instruction-tuned LMs generalize surprisingly well across domains, languages, and time periods.
arXiv Detail & Related papers (2023-05-23T19:38:28Z) - Stabilized In-Context Learning with Pre-trained Language Models for Few
Shot Dialogue State Tracking [57.92608483099916]
Large pre-trained language models (PLMs) have shown impressive unaided performance across many NLP tasks.
For more complex tasks such as dialogue state tracking (DST), designing prompts that reliably convey the desired intent is nontrivial.
We introduce a saliency model to limit dialogue text length, allowing us to include more exemplars per query.
arXiv Detail & Related papers (2023-02-12T15:05:10Z) - Efficient and Flexible Topic Modeling using Pretrained Embeddings and
Bag of Sentences [1.8592384822257952]
We propose a novel topic modeling and inference algorithm.
We leverage pre-trained sentence embeddings by combining generative process models and clustering.
TheTailor evaluation shows that our method yields state-of-the art results with relatively little computational demands.
arXiv Detail & Related papers (2023-02-06T20:13:11Z) - Enriching Relation Extraction with OpenIE [70.52564277675056]
Relation extraction (RE) is a sub-discipline of information extraction (IE)
In this work, we explore how recent approaches for open information extraction (OpenIE) may help to improve the task of RE.
Our experiments over two annotated corpora, KnowledgeNet and FewRel, demonstrate the improved accuracy of our enriched models.
arXiv Detail & Related papers (2022-12-19T11:26:23Z) - Schema-aware Reference as Prompt Improves Data-Efficient Knowledge Graph
Construction [57.854498238624366]
We propose a retrieval-augmented approach, which retrieves schema-aware Reference As Prompt (RAP) for data-efficient knowledge graph construction.
RAP can dynamically leverage schema and knowledge inherited from human-annotated and weak-supervised data as a prompt for each sample.
arXiv Detail & Related papers (2022-10-19T16:40:28Z) - Autoregressive Search Engines: Generating Substrings as Document
Identifiers [53.0729058170278]
Autoregressive language models are emerging as the de-facto standard for generating answers.
Previous work has explored ways to partition the search space into hierarchical structures.
In this work we propose an alternative that doesn't force any structure in the search space: using all ngrams in a passage as its possible identifiers.
arXiv Detail & Related papers (2022-04-22T10:45:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.