A Global Context Mechanism for Sequence Labeling
- URL: http://arxiv.org/abs/2305.19928v6
- Date: Sun, 06 Jul 2025 08:50:26 GMT
- Title: A Global Context Mechanism for Sequence Labeling
- Authors: Conglei Xu, Kun Shen, Hongguang Sun, Yang Xu,
- Abstract summary: Global sentence information is crucial for sequence labeling tasks, where each word in a sentence must be assigned a label.<n>Previous work has proposed various RNN variants to integrate global sentence information into word representations.<n>We introduce a simple yet effective mechanism that addresses these limitations.<n>Our approach efficiently supplements global sentence information for both BiLSTM and transformer-based models.
- Score: 3.237003512894164
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Global sentence information is crucial for sequence labeling tasks, where each word in a sentence must be assigned a label. While BiLSTM models are widely used, they often fail to capture sufficient global context for inner words. Previous work has proposed various RNN variants to integrate global sentence information into word representations. However, these approaches suffer from three key limitations: (1) they are slower in both inference and training compared to the original BiLSTM, (2) they cannot effectively supplement global information for transformer-based models, and (3) the high time cost associated with reimplementing and integrating these customized RNNs into existing architectures. In this study, we introduce a simple yet effective mechanism that addresses these limitations. Our approach efficiently supplements global sentence information for both BiLSTM and transformer-based models, with minimal degradation in inference and training speed, and is easily pluggable into current architectures. We demonstrate significant improvements in F1 scores across seven popular benchmarks, including Named Entity Recognition (NER) tasks such as Conll2003, Wnut2017 , and the Chinese named-entity recognition task Weibo, as well as End-to-End Aspect-Based Sentiment Analysis (E2E-ABSA) benchmarks such as Laptop14, Restaurant14, Restaurant15, and Restaurant16. With out any extra strategy, we achieve third highest score on weibo NER benchmark. Compared to CRF, one of the most popular frameworks for sequence labeling, our mechanism achieves competitive F1 scores while offering superior inference and training speed. Code is available at: https://github.com/conglei2XU/Global-Context-Mechanism
Related papers
- Towards Global Retrieval Augmented Generation: A Benchmark for Corpus-Level Reasoning [50.27838512822097]
We introduce GlobalQA, the first benchmark specifically designed to evaluate global RAG capabilities.<n>We propose GlobalRAG, a multi-tool collaborative framework that preserves structural coherence through chunk-level retrieval.<n>On the Qwen2.5-14B model, GlobalRAG achieves 6.63 F1 compared to the strongest baseline's 1.51 F1.
arXiv Detail & Related papers (2025-10-30T07:29:14Z) - End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF: A Reproducibility Study [1.7188280334580195]
We present a study of the state-of-the-art neural architecture for sequence labeling proposed by Ma and Hovycitemaend.<n>The original BiLSTM-CNN-CRF model combines character-level representations via Convolutional Neural Networks (CNNs), word-level context modeling through BiLSTMs, and structured prediction using Conditional Random Fields (CRFs)<n>Our implementation successfully reproduces the key results, achieving 91.18% F1-score on CoNLL-2003 NER and demonstrating the model's effectiveness across sequence labeling tasks.
arXiv Detail & Related papers (2025-10-13T02:49:21Z) - Semantic-Aligned Learning with Collaborative Refinement for Unsupervised VI-ReID [82.12123628480371]
Unsupervised person re-identification (USL-VI-ReID) seeks to match pedestrian images of the same individual across different modalities without human annotations for model learning.<n>Previous methods unify pseudo-labels of cross-modality images through label association algorithms and then design contrastive learning framework for global feature learning.<n>We propose a Semantic-Aligned Learning with Collaborative Refinement (SALCR) framework, which builds up objective for specific fine-grained patterns emphasized by each modality.
arXiv Detail & Related papers (2025-04-27T13:58:12Z) - FewTopNER: Integrating Few-Shot Learning with Topic Modeling and Named Entity Recognition in a Multilingual Framework [0.0]
FewTopNER is a framework that integrates few-shot named entity recognition with topic-aware contextual modeling.<n> Empirical evaluations on multilingual benchmarks demonstrate FewTopNER significantly outperforms state-of-the-art few-shot NER models.
arXiv Detail & Related papers (2025-02-04T15:13:40Z) - Towards Generalizable Trajectory Prediction Using Dual-Level Representation Learning And Adaptive Prompting [107.4034346788744]
Existing vehicle trajectory prediction models struggle with generalizability, prediction uncertainties, and handling complex interactions.<n>We propose Perceiver with Register queries (PerReg+), a novel trajectory prediction framework that introduces: (1) Dual-Level Representation Learning via Self-Distillation (SD) and Masked Reconstruction (MR), capturing global context and fine-grained details; (2) Enhanced Multimodality using register-based queries and pretraining, eliminating the need for clustering and suppression; and (3) Adaptive Prompt Tuning during fine-tuning, freezing the main architecture and optimizing a small number of prompts for efficient adaptation.
arXiv Detail & Related papers (2025-01-08T20:11:09Z) - ORIGAMI: A generative transformer architecture for predictions from semi-structured data [3.5639148953570836]
ORIGAMI is a transformer-based architecture that processes nested key/value pairs.<n>By reformulating classification as next-token prediction, ORIGAMI naturally handles both single-label and multi-label tasks.
arXiv Detail & Related papers (2024-12-23T07:21:17Z) - Part-aware Unified Representation of Language and Skeleton for Zero-shot Action Recognition [57.97930719585095]
We introduce Part-aware Unified Representation between Language and Skeleton (PURLS) to explore visual-semantic alignment at both local and global scales.
Our approach is evaluated on various skeleton/language backbones and three large-scale datasets.
The results showcase the universality and superior performance of PURLS, surpassing prior skeleton-based solutions and standard baselines from other domains.
arXiv Detail & Related papers (2024-06-19T08:22:32Z) - Hyperbolic sentence representations for solving Textual Entailment [0.0]
We use the Poincare ball to embed sentences with the goal of proving how hyperbolic spaces can be used for solving Textual Entailment.
We evaluate against baselines of various backgrounds, including LSTMs, Order Embeddings and Euclidean Averaging.
We consistently outperform the baselines on the SICK dataset and are second only to Order Embeddings on the SNLI dataset.
arXiv Detail & Related papers (2024-06-15T15:39:43Z) - CrossGLG: LLM Guides One-shot Skeleton-based 3D Action Recognition in a Cross-level Manner [41.001366870464636]
We propose to leverage text description generated from large language models to guide feature learning.
We first utilize the global text description to guide the skeleton encoder focus on informative joints.
We build non-local interaction between local text and joint features, to form the final global representation.
arXiv Detail & Related papers (2024-03-15T07:51:35Z) - USER: Unified Semantic Enhancement with Momentum Contrast for Image-Text
Retrieval [115.28586222748478]
Image-Text Retrieval (ITR) aims at searching for the target instances that are semantically relevant to the given query from the other modality.
Existing approaches typically suffer from two major limitations.
arXiv Detail & Related papers (2023-01-17T12:42:58Z) - Neural Machine Translation with Contrastive Translation Memories [71.86990102704311]
Retrieval-augmented Neural Machine Translation models have been successful in many translation scenarios.
We propose a new retrieval-augmented NMT to model contrastively retrieved translation memories that are holistically similar to the source sentence.
In training phase, a Multi-TM contrastive learning objective is introduced to learn salient feature of each TM with respect to target sentence.
arXiv Detail & Related papers (2022-12-06T17:10:17Z) - Text Summarization with Oracle Expectation [88.39032981994535]
Extractive summarization produces summaries by identifying and concatenating the most important sentences in a document.
Most summarization datasets do not come with gold labels indicating whether document sentences are summary-worthy.
We propose a simple yet effective labeling algorithm that creates soft, expectation-based sentence labels.
arXiv Detail & Related papers (2022-09-26T14:10:08Z) - Hierarchical Local-Global Transformer for Temporal Sentence Grounding [58.247592985849124]
This paper studies the multimedia problem of temporal sentence grounding.
It aims to accurately determine the specific video segment in an untrimmed video according to a given sentence query.
arXiv Detail & Related papers (2022-08-31T14:16:56Z) - Exploiting Global Contextual Information for Document-level Named Entity
Recognition [46.99922251839363]
We propose a model called Global Context enhanced Document-level NER (GCDoc)
At word-level, a document graph is constructed to model a wider range of dependencies between words.
At sentence-level, for appropriately modeling wider context beyond single sentence, we employ a cross-sentence module.
Our model reaches F1 score of 92.22 (93.40 with BERT) on CoNLL 2003 dataset and 88.32 (90.49 with BERT) on Ontonotes 5.0 dataset.
arXiv Detail & Related papers (2021-06-02T01:52:07Z) - Reformulating Sentence Ordering as Conditional Text Generation [17.91448517871621]
We present Reorder-BART (RE-BART), a sentence ordering framework.
We reformulate the task as a conditional text-to-marker generation setup.
Our framework achieves the state-of-the-art performance across six datasets in Perfect Match Ratio (PMR) and Kendall's tau ($tau$) metric.
arXiv Detail & Related papers (2021-04-14T18:16:47Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z) - Syntactic representation learning for neural network based TTS with
syntactic parse tree traversal [49.05471750563229]
We propose a syntactic representation learning method based on syntactic parse tree to automatically utilize the syntactic structure information.
Experimental results demonstrate the effectiveness of our proposed approach.
For sentences with multiple syntactic parse trees, prosodic differences can be clearly perceived from the synthesized speeches.
arXiv Detail & Related papers (2020-12-13T05:52:07Z) - BERT-hLSTMs: BERT and Hierarchical LSTMs for Visual Storytelling [6.196023076311228]
We propose a novel hierarchical visual storytelling framework which separately models sentence-level and word-level semantics.
We then employ a hierarchical LSTM network: the bottom LSTM receives as input the sentence vector representation from BERT, to learn the dependencies between the sentences corresponding to images, and the top LSTM is responsible for generating the corresponding word vector representations.
Experimental results demonstrate that our model outperforms most closely related baselines under automatic evaluation metrics BLEU and CIDEr.
arXiv Detail & Related papers (2020-12-03T18:07:28Z) - GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing [117.98107557103877]
We present GraPPa, an effective pre-training approach for table semantic parsing.
We construct synthetic question-pairs over high-free tables via a synchronous context-free grammar.
To maintain the model's ability to represent real-world data, we also include masked language modeling.
arXiv Detail & Related papers (2020-09-29T08:17:58Z) - Improving Bi-LSTM Performance for Indonesian Sentiment Analysis Using
Paragraph Vector [0.0]
Bidirectional Long Short-Term Memory Network (Bi-LSTM) has shown promising performance in sentiment classification task.
We propose the using of an existing document representation method called paragraph vector as additional input features for Bi-LSTM.
arXiv Detail & Related papers (2020-09-12T03:43:30Z) - BURT: BERT-inspired Universal Representation from Twin Structure [89.82415322763475]
BURT (BERT inspired Universal Representation from Twin Structure) is capable of generating universal, fixed-size representations for input sequences of any granularity.
Our proposed BURT adopts the Siamese network, learning sentence-level representations from natural language inference dataset and word/phrase-level representations from paraphrasing dataset.
We evaluate BURT across different granularities of text similarity tasks, including STS tasks, SemEval2013 Task 5(a) and some commonly used word similarity tasks.
arXiv Detail & Related papers (2020-04-29T04:01:52Z) - Depth-Adaptive Graph Recurrent Network for Text Classification [71.20237659479703]
Sentence-State LSTM (S-LSTM) is a powerful and high efficient graph recurrent network.
We propose a depth-adaptive mechanism for the S-LSTM, which allows the model to learn how many computational steps to conduct for different words as required.
arXiv Detail & Related papers (2020-02-29T03:09:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.