Related papers: Transformer Enhanced Relation Classification: A Comparative Analysis of Contextuality, Data Efficiency and Sequence Complexity

Transformer Enhanced Relation Classification: A Comparative Analysis of Contextuality, Data Efficiency and Sequence Complexity

URL: http://arxiv.org/abs/2509.11374v1
Date: Sun, 14 Sep 2025 18:11:31 GMT
Title: Transformer Enhanced Relation Classification: A Comparative Analysis of Contextuality, Data Efficiency and Sequence Complexity
Authors: Bowen Jing, Yang Cui, Tianpeng Huang,
Abstract summary: We systematically compare the performance of deep supervised learning approaches without transformers and those with transformers.<n>The results show that transformer-based models outperform non-transformer models, achieving micro F1 scores of 80-90%.<n>We briefly review the research journey in supervised relation classification and discuss the role and current status of large language models (LLMs) in relation extraction.
Score: 9.402566546100973
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the era of large language model, relation extraction (RE) plays an important role in information extraction through the transformation of unstructured raw text into structured data (Wadhwa et al., 2023). In this paper, we systematically compare the performance of deep supervised learning approaches without transformers and those with transformers. We used a series of non-transformer architectures such as PA-LSTM(Zhang et al., 2017), C-GCN(Zhang et al., 2018), and AGGCN(attention guide GCN)(Guo et al., 2019), and a series of transformer architectures such as BERT, RoBERTa, and R-BERT(Wu and He, 2019). Our comparison included traditional metrics like micro F1, as well as evaluations in different scenarios, varying sentence lengths, and different percentages of the dataset for training. Our experiments were conducted on TACRED, TACREV, and RE-TACRED. The results show that transformer-based models outperform non-transformer models, achieving micro F1 scores of 80-90% compared to 64-67% for non-transformer models. Additionally, we briefly review the research journey in supervised relation classification and discuss the role and current status of large language models (LLMs) in relation extraction.

Related papers

Dissecting Multimodal In-Context Learning: Modality Asymmetries and Circuit Dynamics in modern Transformers [59.472505916020936]
We investigate how transformers learn to associate information across modalities from in-context examples.<n>We revisit core principles of unimodal ICL in modern transformers.<n>Mechanistic analysis shows that both settings rely on an induction-style mechanism that copies labels from matching in-context exemplars.
arXiv Detail & Related papers (2026-01-28T17:37:28Z)
Advancements in Natural Language Processing: Exploring Transformer-Based Architectures for Text Understanding [10.484788943232674]
This paper explores the advancements in transformer models, such as BERT and GPT, focusing on their superior performance in text understanding tasks.<n>The results demonstrate state-of-the-art performance on benchmarks like GLUE and SQuAD, with F1 scores exceeding 90%, though challenges such as high computational costs persist.
arXiv Detail & Related papers (2025-03-26T04:45:33Z)
Probing the limit of hydrologic predictability with the Transformer network [7.326504492614808]
We show that a vanilla Transformer architecture is not competitive against LSTM on the widely benchmarked CAMELS dataset. A recurrence-free variant of Transformer can obtain mixed comparisons with LSTM, producing the same Kling-Gupta efficiency coefficient (KGE) along with other metrics. While the Transformer results are not higher than current state-of-the-art, we still learned some valuable lessons.
arXiv Detail & Related papers (2023-06-21T17:06:54Z)
Extensive Evaluation of Transformer-based Architectures for Adverse Drug Events Extraction [6.78974856327994]
Adverse Event (ADE) extraction is one of the core tasks in digital pharmacovigilance. We evaluate 19 Transformer-based models for ADE extraction on informal texts. At the end of our analyses, we identify a list of take-home messages that can be derived from the experimental data.
arXiv Detail & Related papers (2023-06-08T15:25:24Z)
Emergent Agentic Transformer from Chain of Hindsight Experience [96.56164427726203]
We show that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches. This is the first time that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
arXiv Detail & Related papers (2023-05-26T00:43:02Z)
Structural Biases for Improving Transformers on Translation into Morphologically Rich Languages [120.74406230847904]
TP-Transformer augments the traditional Transformer architecture to include an additional component to represent structure. The second method imbues structure at the data level by segmenting the data with morphological tokenization. We find that each of these two approaches allows the network to achieve better performance, but this improvement is dependent on the size of the dataset.
arXiv Detail & Related papers (2022-08-11T22:42:24Z)
Vision Transformers are Robust Learners [65.91359312429147]
We study the robustness of the Vision Transformer (ViT) against common corruptions and perturbations, distribution shifts, and natural adversarial examples. We present analyses that provide both quantitative and qualitative indications to explain why ViTs are indeed more robust learners.
arXiv Detail & Related papers (2021-05-17T02:39:22Z)
Improving Transformer-Kernel Ranking Model Using Conformer and Query Term Independence [29.442579683405913]
The Transformer- Kernel (TK) model has demonstrated strong reranking performance on the TREC Deep Learning benchmark. A variant of the TK model -- called TKL -- has been developed that incorporates local self-attention to efficiently process longer input sequences. In this work, we propose a novel Conformer layer as an alternative approach to scale TK to longer input sequences.
arXiv Detail & Related papers (2021-04-19T15:32:34Z)
Long Range Arena: A Benchmark for Efficient Transformers [115.1654897514089]
Long-rangearena benchmark is a suite of tasks consisting of sequences ranging from $1K$ to $16K$ tokens. We systematically evaluate ten well-established long-range Transformer models on our newly proposed benchmark suite.
arXiv Detail & Related papers (2020-11-08T15:53:56Z)
Applying the Transformer to Character-level Transduction [68.91664610425114]
The transformer has been shown to outperform recurrent neural network-based sequence-to-sequence models in various word-level NLP tasks. We show that with a large enough batch size, the transformer does indeed outperform recurrent models for character-level tasks.
arXiv Detail & Related papers (2020-05-20T17:25:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.