Assessing the Impact of Sequence Length Learning on Classification Tasks for Transformer Encoder Models
- URL: http://arxiv.org/abs/2212.08399v2
- Date: Thu, 14 Mar 2024 16:49:24 GMT
- Title: Assessing the Impact of Sequence Length Learning on Classification Tasks for Transformer Encoder Models
- Authors: Jean-Thomas Baillargeon, Luc Lamontagne,
- Abstract summary: Classification algorithms can be affected by the sequence length learning problem whenever observations from different classes have a different length distribution.
This problem causes models to use sequence length as a predictive feature instead of relying on important textual information.
Although most public datasets are not affected by this problem, privately owned corpora for fields such as medicine and insurance may carry this data bias.
- Score: 0.030693357740321774
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Classification algorithms using Transformer architectures can be affected by the sequence length learning problem whenever observations from different classes have a different length distribution. This problem causes models to use sequence length as a predictive feature instead of relying on important textual information. Although most public datasets are not affected by this problem, privately owned corpora for fields such as medicine and insurance may carry this data bias. The exploitation of this sequence length feature poses challenges throughout the value chain as these machine learning models can be used in critical applications. In this paper, we empirically expose this problem and present approaches to minimize its impacts.
Related papers
- KTbench: A Novel Data Leakage-Free Framework for Knowledge Tracing [0.0]
Knowledge Tracing (KT) is concerned with predicting students' future performance on learning items in intelligent tutoring systems.
Many KT models expand the sequence of item-student interactions into KC-student interactions by replacing learning items with their constituting KCs.
This approach addresses the issue of sparse item-student interactions and minimises model parameters.
arXiv Detail & Related papers (2024-03-22T15:54:30Z) - Preventing RNN from Using Sequence Length as a Feature [0.08594140167290096]
Recurrent neural networks are deep learning topologies that can be trained to classify long documents.
But they can use the length differences between texts of different classes as a prominent classification feature.
This has the effect of producing models that are brittle and fragile to concept drift, can provide misleading performances and are trivially explainable regardless of text content.
arXiv Detail & Related papers (2022-12-16T04:23:36Z) - Mutual Exclusivity Training and Primitive Augmentation to Induce
Compositionality [84.94877848357896]
Recent datasets expose the lack of the systematic generalization ability in standard sequence-to-sequence models.
We analyze this behavior of seq2seq models and identify two contributing factors: a lack of mutual exclusivity bias and the tendency to memorize whole examples.
We show substantial empirical improvements using standard sequence-to-sequence models on two widely-used compositionality datasets.
arXiv Detail & Related papers (2022-11-28T17:36:41Z) - Automatic Data Augmentation via Invariance-Constrained Learning [94.27081585149836]
Underlying data structures are often exploited to improve the solution of learning tasks.
Data augmentation induces these symmetries during training by applying multiple transformations to the input data.
This work tackles these issues by automatically adapting the data augmentation while solving the learning task.
arXiv Detail & Related papers (2022-09-29T18:11:01Z) - Amortized Inference for Causal Structure Learning [72.84105256353801]
Learning causal structure poses a search problem that typically involves evaluating structures using a score or independence test.
We train a variational inference model to predict the causal structure from observational/interventional data.
Our models exhibit robust generalization capabilities under substantial distribution shift.
arXiv Detail & Related papers (2022-05-25T17:37:08Z) - Determination of class-specific variables in nonparametric
multiple-class classification [0.0]
We propose a probability-based nonparametric multiple-class classification method, and integrate it with the ability of identifying high impact variables for individual class.
We report the properties of the proposed method, and use both synthesized and real data sets to illustrate its properties under different classification situations.
arXiv Detail & Related papers (2022-05-07T10:08:58Z) - ChunkFormer: Learning Long Time Series with Multi-stage Chunked
Transformer [0.0]
Original Transformer-based models adopt an attention mechanism to discover global information along a sequence.
ChunkFormer splits the long sequences into smaller sequence chunks for the attention calculation.
In this way, the proposed model gradually learns both local and global information without changing the total length of the input sequences.
arXiv Detail & Related papers (2021-12-30T15:06:32Z) - Learning Debiased and Disentangled Representations for Semantic
Segmentation [52.35766945827972]
We propose a model-agnostic and training scheme for semantic segmentation.
By randomly eliminating certain class information in each training iteration, we effectively reduce feature dependencies among classes.
Models trained with our approach demonstrate strong results on multiple semantic segmentation benchmarks.
arXiv Detail & Related papers (2021-10-31T16:15:09Z) - Overcoming the curse of dimensionality with Laplacian regularization in
semi-supervised learning [80.20302993614594]
We provide a statistical analysis to overcome drawbacks of Laplacian regularization.
We unveil a large body of spectral filtering methods that exhibit desirable behaviors.
We provide realistic computational guidelines in order to make our method usable with large amounts of data.
arXiv Detail & Related papers (2020-09-09T14:28:54Z) - Learning Causal Models Online [103.87959747047158]
Predictive models can rely on spurious correlations in the data for making predictions.
One solution for achieving strong generalization is to incorporate causal structures in the models.
We propose an online algorithm that continually detects and removes spurious features.
arXiv Detail & Related papers (2020-06-12T20:49:20Z) - Conditional Mutual information-based Contrastive Loss for Financial Time
Series Forecasting [12.0855096102517]
We present a representation learning framework for financial time series forecasting.
In this paper, we propose to first learn compact representations from time series data, then use the learned representations to train a simpler model for predicting time series movements.
arXiv Detail & Related papers (2020-02-18T15:24:33Z) - Multi-label Prediction in Time Series Data using Deep Neural Networks [19.950094635430048]
This paper addresses a multi-label predictive fault classification problem for multidimensional time-series data.
The proposed algorithm is tested on two public benchmark datasets.
arXiv Detail & Related papers (2020-01-27T21:35:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.