Related papers: Cascaded Semantic and Positional Self-Attention Network for Document Classification

Cascaded Semantic and Positional Self-Attention Network for Document Classification

URL: http://arxiv.org/abs/2009.07148v2
Date: Sat, 19 Sep 2020 18:43:59 GMT
Title: Cascaded Semantic and Positional Self-Attention Network for Document Classification
Authors: Juyong Jiang, Jie Zhang, Kai Zhang
Abstract summary: We propose a new architecture to aggregate the two sources of information using cascaded semantic and positional self-attention network (CSPAN) The CSPAN uses a semantic self-attention layer cascaded with Bi-LSTM to process the semantic and positional information in a sequential manner, and then adaptively combine them together through a residue connection. We evaluate the CSPAN model on several benchmark data sets for document classification with careful ablation studies, and demonstrate the encouraging results compared with state of the art.
Score: 9.292885582770092
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Transformers have shown great success in learning representations for language modelling. However, an open challenge still remains on how to systematically aggregate semantic information (word embedding) with positional (or temporal) information (word orders). In this work, we propose a new architecture to aggregate the two sources of information using cascaded semantic and positional self-attention network (CSPAN) in the context of document classification. The CSPAN uses a semantic self-attention layer cascaded with Bi-LSTM to process the semantic and positional information in a sequential manner, and then adaptively combine them together through a residue connection. Compared with commonly used positional encoding schemes, CSPAN can exploit the interaction between semantics and word positions in a more interpretable and adaptive manner, and the classification performance can be notably improved while simultaneously preserving a compact model size and high convergence rate. We evaluate the CSPAN model on several benchmark data sets for document classification with careful ablation studies, and demonstrate the encouraging results compared with state of the art.

Related papers

CiteFusion: An Ensemble Framework for Citation Intent Classification Harnessing Dual-Model Binary Couples and SHAP Analyses [1.7812428873698407]
This study introduces CiteFusion, an ensemble framework designed to address the multiclass Citation Intent Classification task. CiteFusion achieves state-of-the-art performance, with Macro-F1 scores of 89.60% on SciCite and 76.24% on ACL-ARC. We release a web-based application that classifies citation intents leveraging CiteFusion models developed on SciCite.
arXiv Detail & Related papers (2024-07-18T09:29:33Z)
Exploiting Contextual Target Attributes for Target Sentiment Classification [53.30511968323911]
Existing PTLM-based models for TSC can be categorized into two groups: 1) fine-tuning-based models that adopt PTLM as the context encoder; 2) prompting-based models that transfer the classification task to the text/word generation task. We present a new perspective of leveraging PTLM for TSC: simultaneously leveraging the merits of both language modeling and explicit target-context interactions via contextual target attributes.
arXiv Detail & Related papers (2023-12-21T11:45:28Z)
FLIP: Fine-grained Alignment between ID-based Models and Pretrained Language Models for CTR Prediction [49.510163437116645]
Click-through rate (CTR) prediction plays as a core function module in personalized online services. Traditional ID-based models for CTR prediction take as inputs the one-hot encoded ID features of tabular modality. Pretrained Language Models(PLMs) has given rise to another paradigm, which takes as inputs the sentences of textual modality. We propose to conduct Fine-grained feature-level ALignment between ID-based Models and Pretrained Language Models(FLIP) for CTR prediction.
arXiv Detail & Related papers (2023-10-30T11:25:03Z)
Contextual Dictionary Lookup for Knowledge Graph Completion [32.493168863565465]
Knowledge graph completion (KGC) aims to solve the incompleteness of knowledge graphs (KGs) by predicting missing links from known triples. Most existing embedding models map each relation into a unique vector, overlooking the specific fine-grained semantics of them under different entities. We present a novel method utilizing contextual dictionary lookup, enabling conventional embedding models to learn fine-grained semantics of relations in an end-to-end manner.
arXiv Detail & Related papers (2023-06-13T12:13:41Z)
Advancing Incremental Few-shot Semantic Segmentation via Semantic-guided Relation Alignment and Adaptation [98.51938442785179]
Incremental few-shot semantic segmentation aims to incrementally extend a semantic segmentation model to novel classes. This task faces a severe semantic-aliasing issue between base and novel classes due to data imbalance. We propose the Semantic-guided Relation Alignment and Adaptation (SRAA) method that fully considers the guidance of prior semantic information.
arXiv Detail & Related papers (2023-05-18T10:40:52Z)
Prototype-based Embedding Network for Scene Graph Generation [105.97836135784794]
Current Scene Graph Generation (SGG) methods explore contextual information to predict relationships among entity pairs. Due to the diverse visual appearance of numerous possible subject-object combinations, there is a large intra-class variation within each predicate category. Prototype-based Embedding Network (PE-Net) models entities/predicates with prototype-aligned compact and distinctive representations. PL is introduced to help PE-Net efficiently learn such entitypredicate matching, and Prototype Regularization (PR) is devised to relieve the ambiguous entity-predicate matching.
arXiv Detail & Related papers (2023-03-13T13:30:59Z)
Co-Driven Recognition of Semantic Consistency via the Fusion of Transformer and HowNet Sememes Knowledge [6.184249194474601]
This paper proposes a co-driven semantic consistency recognition method based on the fusion of Transformer and HowNet sememes knowledge. BiLSTM is exploited to encode the conceptual semantic information and infer the semantic consistency.
arXiv Detail & Related papers (2023-02-21T09:53:19Z)
Word Sense Induction with Hierarchical Clustering and Mutual Information Maximization [14.997937028599255]
Word sense induction is a difficult problem in natural language processing. We propose a novel unsupervised method based on hierarchical clustering and invariant information clustering. We empirically demonstrate that, in certain cases, our approach outperforms prior WSI state-of-the-art methods.
arXiv Detail & Related papers (2022-10-11T13:04:06Z)
InfoCSE: Information-aggregated Contrastive Learning of Sentence Embeddings [61.77760317554826]
This paper proposes an information-d contrastive learning framework for learning unsupervised sentence embeddings, termed InfoCSE. We evaluate the proposed InfoCSE on several benchmark datasets w.r.t the semantic text similarity (STS) task. Experimental results show that InfoCSE outperforms SimCSE by an average Spearman correlation of 2.60% on BERT-base, and 1.77% on BERT-large.
arXiv Detail & Related papers (2022-10-08T15:53:19Z)
UnifieR: A Unified Retriever for Large-Scale Retrieval [84.61239936314597]
Large-scale retrieval is to recall relevant documents from a huge collection given a query. Recent retrieval methods based on pre-trained language models (PLM) can be coarsely categorized into either dense-vector or lexicon-based paradigms. We propose a new learning framework, UnifieR which unifies dense-vector and lexicon-based retrieval in one model with a dual-representing capability.
arXiv Detail & Related papers (2022-05-23T11:01:59Z)
Exploiting Global Contextual Information for Document-level Named Entity Recognition [46.99922251839363]
We propose a model called Global Context enhanced Document-level NER (GCDoc) At word-level, a document graph is constructed to model a wider range of dependencies between words. At sentence-level, for appropriately modeling wider context beyond single sentence, we employ a cross-sentence module. Our model reaches F1 score of 92.22 (93.40 with BERT) on CoNLL 2003 dataset and 88.32 (90.49 with BERT) on Ontonotes 5.0 dataset.
arXiv Detail & Related papers (2021-06-02T01:52:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.