Preventing RNN from Using Sequence Length as a Feature
- URL: http://arxiv.org/abs/2212.08276v1
- Date: Fri, 16 Dec 2022 04:23:36 GMT
- Title: Preventing RNN from Using Sequence Length as a Feature
- Authors: Jean-Thomas Baillargeon, H\'el\`ene Cossette, Luc Lamontagne
- Abstract summary: Recurrent neural networks are deep learning topologies that can be trained to classify long documents.
But they can use the length differences between texts of different classes as a prominent classification feature.
This has the effect of producing models that are brittle and fragile to concept drift, can provide misleading performances and are trivially explainable regardless of text content.
- Score: 0.08594140167290096
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recurrent neural networks are deep learning topologies that can be trained to
classify long documents. However, in our recent work, we found a critical
problem with these cells: they can use the length differences between texts of
different classes as a prominent classification feature. This has the effect of
producing models that are brittle and fragile to concept drift, can provide
misleading performances and are trivially explainable regardless of text
content. This paper illustrates the problem using synthetic and real-world data
and provides a simple solution using weight decay regularization.
Related papers
- Multi-label Text Classification using GloVe and Neural Network Models [0.27195102129094995]
Existing solutions include traditional machine learning and deep neural networks for predictions.
This paper proposes a method utilizing the bag-of-words model approach based on the GloVe model and the CNN-BiLSTM network.
The method achieves an accuracy rate of 87.26% on the test set and an F1 score of 0.8737, showcasing promising results.
arXiv Detail & Related papers (2023-10-25T01:30:26Z) - Hidden Classification Layers: Enhancing linear separability between
classes in neural networks layers [0.0]
We investigate the impact on deep network performances of a training approach.
We propose a neural network architecture which induces an error function involving the outputs of all the network layers.
arXiv Detail & Related papers (2023-06-09T10:52:49Z) - Assessing the Impact of Sequence Length Learning on Classification Tasks for Transformer Encoder Models [0.030693357740321774]
Classification algorithms can be affected by the sequence length learning problem whenever observations from different classes have a different length distribution.
This problem causes models to use sequence length as a predictive feature instead of relying on important textual information.
Although most public datasets are not affected by this problem, privately owned corpora for fields such as medicine and insurance may carry this data bias.
arXiv Detail & Related papers (2022-12-16T10:46:20Z) - HETFORMER: Heterogeneous Transformer with Sparse Attention for Long-Text
Extractive Summarization [57.798070356553936]
HETFORMER is a Transformer-based pre-trained model with multi-granularity sparse attentions for extractive summarization.
Experiments on both single- and multi-document summarization tasks show that HETFORMER achieves state-of-the-art performance in Rouge F1.
arXiv Detail & Related papers (2021-10-12T22:42:31Z) - Toward the Understanding of Deep Text Matching Models for Information
Retrieval [72.72380690535766]
This paper aims at testing whether existing deep text matching methods satisfy some fundamental gradients in information retrieval.
Specifically, four attributions are used in our study, i.e., term frequency constraint, term discrimination constraint, length normalization constraints, and TF-length constraint.
Experimental results on LETOR 4.0 and MS Marco show that all the investigated deep text matching methods satisfy the above constraints with high probabilities in statistics.
arXiv Detail & Related papers (2021-08-16T13:33:15Z) - Leveraging Sparse Linear Layers for Debuggable Deep Networks [86.94586860037049]
We show how fitting sparse linear models over learned deep feature representations can lead to more debuggable neural networks.
The resulting sparse explanations can help to identify spurious correlations, explain misclassifications, and diagnose model biases in vision and language tasks.
arXiv Detail & Related papers (2021-05-11T08:15:25Z) - Recurrence-free unconstrained handwritten text recognition using gated
fully convolutional network [2.277447144331876]
Unconstrained handwritten text recognition is a major step in most document analysis tasks.
One alternative solution to using LSTM cells is to compensate the long time memory loss with an heavy use of convolutional layers.
We present a Gated Fully Convolutional Network architecture that is a recurrence-free alternative to the well-known CNN+LSTM architectures.
arXiv Detail & Related papers (2020-12-09T10:30:13Z) - Be More with Less: Hypergraph Attention Networks for Inductive Text
Classification [56.98218530073927]
Graph neural networks (GNNs) have received increasing attention in the research community and demonstrated their promising results on this canonical task.
Despite the success, their performance could be largely jeopardized in practice since they are unable to capture high-order interaction between words.
We propose a principled model -- hypergraph attention networks (HyperGAT) which can obtain more expressive power with less computational consumption for text representation learning.
arXiv Detail & Related papers (2020-11-01T00:21:59Z) - FIND: Human-in-the-Loop Debugging Deep Text Classifiers [55.135620983922564]
We propose FIND -- a framework which enables humans to debug deep learning text classifiers by disabling irrelevant hidden features.
Experiments show that by using FIND, humans can improve CNN text classifiers which were trained under different types of imperfect datasets.
arXiv Detail & Related papers (2020-10-10T12:52:53Z) - Temporal Calibrated Regularization for Robust Noisy Label Learning [60.90967240168525]
Deep neural networks (DNNs) exhibit great success on many tasks with the help of large-scale well annotated datasets.
However, labeling large-scale data can be very costly and error-prone so that it is difficult to guarantee the annotation quality.
We propose a Temporal Calibrated Regularization (TCR) in which we utilize the original labels and the predictions in the previous epoch together.
arXiv Detail & Related papers (2020-07-01T04:48:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.