Hierarchical Neural Network Approaches for Long Document Classification
- URL: http://arxiv.org/abs/2201.06774v1
- Date: Tue, 18 Jan 2022 07:17:40 GMT
- Title: Hierarchical Neural Network Approaches for Long Document Classification
- Authors: Snehal Khandve, Vedangi Wagh, Apurva Wani, Isha Joshi, Raviraj Joshi
- Abstract summary: We employ pre-trained Universal Sentence (USE) and Bidirectional Representations from Transformers (BERT) in a hierarchical setup to capture better representations efficiently.
Our proposed models are conceptually simple where we divide the input data into chunks and then pass this through base models of BERT and USE.
We show that USE + CNN/LSTM performs better than its stand-alone baseline. Whereas the BERT + CNN/LSTM performs on par with its stand-alone counterpart.
- Score: 3.6700088931938835
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text classification algorithms investigate the intricate relationships
between words or phrases and attempt to deduce the document's interpretation.
In the last few years, these algorithms have progressed tremendously.
Transformer architecture and sentence encoders have proven to give superior
results on natural language processing tasks. But a major limitation of these
architectures is their applicability for text no longer than a few hundred
words. In this paper, we explore hierarchical transfer learning approaches for
long document classification. We employ pre-trained Universal Sentence Encoder
(USE) and Bidirectional Encoder Representations from Transformers (BERT) in a
hierarchical setup to capture better representations efficiently. Our proposed
models are conceptually simple where we divide the input data into chunks and
then pass this through base models of BERT and USE. Then output representation
for each chunk is then propagated through a shallow neural network comprising
of LSTMs or CNNs for classifying the text data. These extensions are evaluated
on 6 benchmark datasets. We show that USE + CNN/LSTM performs better than its
stand-alone baseline. Whereas the BERT + CNN/LSTM performs on par with its
stand-alone counterpart. However, the hierarchical BERT models are still
desirable as it avoids the quadratic complexity of the attention mechanism in
BERT. Along with the hierarchical approaches, this work also provides a
comparison of different deep learning algorithms like USE, BERT, HAN,
Longformer, and BigBird for long document classification. The Longformer
approach consistently performs well on most of the datasets.
Related papers
- Less is More: Making Smaller Language Models Competent Subgraph Retrievers for Multi-hop KGQA [51.3033125256716]
We model the subgraph retrieval task as a conditional generation task handled by small language models.
Our base generative subgraph retrieval model, consisting of only 220M parameters, competitive retrieval performance compared to state-of-the-art models.
Our largest 3B model, when plugged with an LLM reader, sets new SOTA end-to-end performance on both the WebQSP and CWQ benchmarks.
arXiv Detail & Related papers (2024-10-08T15:22:36Z) - Neural Architecture Search for Sentence Classification with BERT [4.862490782515929]
We perform an AutoML search to find architectures that outperform the current single layer at only a small compute cost.
We validate our classification architecture on a variety of NLP benchmarks from the GLUE dataset.
arXiv Detail & Related papers (2024-03-27T13:25:43Z) - Breaking the Token Barrier: Chunking and Convolution for Efficient Long
Text Classification with BERT [0.0]
Transformer-based models, specifically BERT, have propelled research in various NLP tasks.
BERT models are limited to a maximum token limit of 512 tokens. Consequently, this makes it non-trivial to apply it in a practical setting with long input.
We propose a relatively simple extension to vanilla BERT architecture called ChunkBERT that allows finetuning of any pretrained models to perform inference on arbitrarily long text.
arXiv Detail & Related papers (2023-10-31T15:41:08Z) - A multi-model-based deep learning framework for short text multiclass
classification with the imbalanced and extremely small data set [0.6875312133832077]
This paper proposes a multimodel-based deep learning framework for short-text multiclass classification with an imbalanced and extremely small data set.
It retains the state-of-the-art baseline performance in terms of precision, recall, accuracy, and F1 score.
arXiv Detail & Related papers (2022-06-24T00:51:02Z) - Autoregressive Search Engines: Generating Substrings as Document
Identifiers [53.0729058170278]
Autoregressive language models are emerging as the de-facto standard for generating answers.
Previous work has explored ways to partition the search space into hierarchical structures.
In this work we propose an alternative that doesn't force any structure in the search space: using all ngrams in a passage as its possible identifiers.
arXiv Detail & Related papers (2022-04-22T10:45:01Z) - Long Document Summarization with Top-down and Bottom-up Inference [113.29319668246407]
We propose a principled inference framework to improve summarization models on two aspects.
Our framework assumes a hierarchical latent structure of a document where the top-level captures the long range dependency.
We demonstrate the effectiveness of the proposed framework on a diverse set of summarization datasets.
arXiv Detail & Related papers (2022-03-15T01:24:51Z) - Comparative Study of Long Document Classification [0.0]
We revisit long document classification using standard machine learning approaches.
We benchmark approaches ranging from simple Naive Bayes to complex BERT on six standard text classification datasets.
arXiv Detail & Related papers (2021-11-01T04:51:51Z) - HETFORMER: Heterogeneous Transformer with Sparse Attention for Long-Text
Extractive Summarization [57.798070356553936]
HETFORMER is a Transformer-based pre-trained model with multi-granularity sparse attentions for extractive summarization.
Experiments on both single- and multi-document summarization tasks show that HETFORMER achieves state-of-the-art performance in Rouge F1.
arXiv Detail & Related papers (2021-10-12T22:42:31Z) - Learning to Synthesize Data for Semantic Parsing [57.190817162674875]
We propose a generative model which models the composition of programs and maps a program to an utterance.
Due to the simplicity of PCFG and pre-trained BART, our generative model can be efficiently learned from existing data at hand.
We evaluate our method in both in-domain and out-of-domain settings of text-to-Query parsing on the standard benchmarks of GeoQuery and Spider.
arXiv Detail & Related papers (2021-04-12T21:24:02Z) - POINTER: Constrained Progressive Text Generation via Insertion-based
Generative Pre-training [93.79766670391618]
We present POINTER, a novel insertion-based approach for hard-constrained text generation.
The proposed method operates by progressively inserting new tokens between existing tokens in a parallel manner.
The resulting coarse-to-fine hierarchy makes the generation process intuitive and interpretable.
arXiv Detail & Related papers (2020-05-01T18:11:54Z) - Beyond 512 Tokens: Siamese Multi-depth Transformer-based Hierarchical
Encoder for Long-Form Document Matching [28.190001111358438]
We propose a Siamese Multi-depth Transformer-based SMITH for long-form document matching.
Our model contains several innovations to adapt self-attention models for longer text input.
We will open source a Wikipedia based benchmark dataset, code and a pre-trained checkpoint to accelerate future research on long-form document matching.
arXiv Detail & Related papers (2020-04-26T07:04:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.