Related papers: Leveraging Linguistically Enhanced Embeddings for Open Information Extraction

Leveraging Linguistically Enhanced Embeddings for Open Information Extraction

URL: http://arxiv.org/abs/2403.13903v1
Date: Wed, 20 Mar 2024 18:18:48 GMT
Title: Leveraging Linguistically Enhanced Embeddings for Open Information Extraction
Authors: Fauzan Farooqui, Thanmay Jayakumar, Pulkit Mathur, Mansi Radke,
Abstract summary: Open Information Extraction (OIE) is a structured prediction task in Natural Language Processing (NLP) We are the first to leverage linguistic features with a Seq2Seq PLM for OIE. Our work can give any neural OIE architecture the key performance boost from both PLMs and linguistic features in one go.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Open Information Extraction (OIE) is a structured prediction (SP) task in Natural Language Processing (NLP) that aims to extract structured $n$-ary tuples - usually subject-relation-object triples - from free text. The word embeddings in the input text can be enhanced with linguistic features, usually Part-of-Speech (PoS) and Syntactic Dependency Parse (SynDP) labels. However, past enhancement techniques cannot leverage the power of pretrained language models (PLMs), which themselves have been hardly used for OIE. To bridge this gap, we are the first to leverage linguistic features with a Seq2Seq PLM for OIE. We do so by introducing two methods - Weighted Addition and Linearized Concatenation. Our work can give any neural OIE architecture the key performance boost from both PLMs and linguistic features in one go. In our settings, this shows wide improvements of up to 24.9%, 27.3% and 14.9% on Precision, Recall and F1 scores respectively over the baseline. Beyond this, we address other important challenges in the field: to reduce compute overheads with the features, we are the first ones to exploit Semantic Dependency Parse (SemDP) tags; to address flaws in current datasets, we create a clean synthetic dataset; finally, we contribute the first known study of OIE behaviour in SP models.

Related papers

Causal2Vec: Improving Decoder-only LLMs as Versatile Embedding Models [3.8688081072587326]
Causal2Vec is a general-purpose embedding model tailored to enhance the performance of decoder-only large language models.<n>We first employ a lightweight BERT-style model to pre-encode the input text into a single Contextual token.<n>To mitigate the recency bias by last-token pooling, we introduced the last hidden states of Contextual and EOS tokens as the final text embedding.
arXiv Detail & Related papers (2025-07-31T10:01:11Z)
A Text is Worth Several Tokens: Text Embedding from LLMs Secretly Aligns Well with The Key Tokens [20.37803751979975]
When feeding a text into a large language model-based embedder, the obtained text embedding will be able to be aligned with the key tokens in the input text. We show that this phenomenon is universal and is not affected by model architecture, training strategy, and embedding method.
arXiv Detail & Related papers (2024-06-25T08:55:12Z)
Spatio-Temporal Side Tuning Pre-trained Foundation Models for Video-based Pedestrian Attribute Recognition [58.79807861739438]
Existing pedestrian recognition (PAR) algorithms are mainly developed based on a static image. We propose to understand human attributes using video frames that can fully use temporal information.
arXiv Detail & Related papers (2024-04-27T14:43:32Z)
Leveraging Code to Improve In-context Learning for Semantic Parsing [48.66031267718704]
In-context learning (ICL) is an appealing approach for semantic parsing due to its few-shot nature and improved generalization. We improve the effectiveness of ICL for semantic parsing by (1) using general-purpose programming languages such as Python instead of DSLs, and (2) augmenting prompts with a structured domain description.
arXiv Detail & Related papers (2023-11-16T02:50:06Z)
Improving Domain-Specific Retrieval by NLI Fine-Tuning [64.79760042717822]
This article investigates the fine-tuning potential of natural language inference (NLI) data to improve information retrieval and ranking. We employ both monolingual and multilingual sentence encoders fine-tuned by a supervised method utilizing contrastive loss and NLI data. Our results point to the fact that NLI fine-tuning increases the performance of the models in both tasks and both languages, with the potential to improve mono- and multilingual models.
arXiv Detail & Related papers (2023-08-06T12:40:58Z)
Harnessing Explanations: LLM-to-LM Interpreter for Enhanced Text-Attributed Graph Representation Learning [51.90524745663737]
A key innovation is our use of explanations as features, which can be used to boost GNN performance on downstream tasks. Our method achieves state-of-the-art results on well-established TAG datasets. Our method significantly speeds up training, achieving a 2.88 times improvement over the closest baseline on ogbn-arxiv.
arXiv Detail & Related papers (2023-05-31T03:18:03Z)
Enriching Relation Extraction with OpenIE [70.52564277675056]
Relation extraction (RE) is a sub-discipline of information extraction (IE) In this work, we explore how recent approaches for open information extraction (OpenIE) may help to improve the task of RE. Our experiments over two annotated corpora, KnowledgeNet and FewRel, demonstrate the improved accuracy of our enriched models.
arXiv Detail & Related papers (2022-12-19T11:26:23Z)
DetIE: Multilingual Open Information Extraction Inspired by Object Detection [10.269858179091111]
We present a novel single-pass method for OpenIE inspired by object detection algorithms from computer vision. We show performance improvement 15% on multilingual Re-OIE2016, reaching 75% F1 for both Portuguese and Spanish languages.
arXiv Detail & Related papers (2022-06-24T23:47:00Z)
Training Naturalized Semantic Parsers with Very Little Data [10.709587018625275]
State-of-the-art (SOTA) semantics are seq2seq architectures based on large language models that have been pretrained on vast amounts of text. Recent work has explored a reformulation of semantic parsing whereby the output sequences are themselves natural language sentences. We show that this method delivers new SOTA few-shot performance on the Overnight dataset.
arXiv Detail & Related papers (2022-04-29T17:14:54Z)
The DCU-EPFL Enhanced Dependency Parser at the IWPT 2021 Shared Task [19.98425994656106]
We describe the multitask-EPFL submission to the IWPT 2021 Shared Task on Parsing into Enhanced Universal Dependencies. The task involves parsing Enhanced graphs, which are an extension of the basic dependency trees designed to be more facilitative towards representing semantic structure. evaluation is carried out on 29 treebanks in 17 languages and participants are required to parse the data from each language starting from raw strings.
arXiv Detail & Related papers (2021-07-05T12:42:59Z)
Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models. We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.